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INFORMATION  AND  US  RETRIEVAL 


ABSTRACT 

Information  is  here  defined  abstractly  as  a  set  of  highly 
structured  elements .  This  structure  induces  a  two-way  classification 
of  the  elements  and  also  other  relations  among  them.  The  definition 
includes  rules  for  cjqjsm&ing  this  information  by  incorporating  new 
elements  and  for  discarding  elements  that  have  become  obsolete  or  are  no 
longer  needed.  The  definition  of  information  is  geared  to  its  purpose, 
viz.  to  provide  easy  retrieval  of  known  facts  that  are  of  interest  to  a 
specialist  in  the  field. 

A  pilot  example  of  such  a  retrieval  system  is  included,  ti id 
principles  for  the  physical  realization  of  the  system  are  presented. 

It  is  stated  that  a  clear  distinction  must  be  made  between  a 
collection  of  documents  and  the  information  they  contain.  Likewise,  the 
difference  between  recovering  relevant  documents  and  retrieving  desired 
information  is  emphasised. 
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I.  INTRODUCTION 


This  report  discusses  problems  in  the  field  of  information 
storage  and  retrieval.  For  years  numerous  libraries,  organizations  and 
individuals  have  devoted  their  resources  and  intelligence  to  the 
practice  and  methodology  of  these  problems.  Many  methods  have  been 
proposed  and  discussed;  some  of  them  have  been  tried  on  a  smaller  or 
larger  scale. 

Nevertheless  we  feel  that  this  report  contains  some  aspects  that 
are  not  a  mere  repetition  or  reformulation  of  ideas  discussed  elsewhere. 
First  of  all,  we  make  a  clear  distinction  between  documents  and 
information,  and  we  consider  the  problem  of  storage  and  retrieval  of  the 
latter.  Secondly,  we  propose  that  information  systems  should  be 
constructed  by  professionals.  This  is  just  the  opposite  of  an  attempt 
to  mechanize  the  classification  of  documents.  Thirdly,  we  do  not 
consider  available  technical  devices  that  can  be  used  to  record  and 
recover  information.  Rather  we  discuss  the  functions  that  should  be 
performed  by  a  retrieval  system. 

Certainly,  documents  such  -s  books,  reports,  charts,  and  maps  are  a 
source  of  information.  Therefore  it  is  extremely  important  to  store, 
to  classify,  and  to  catalog  them.  However  we  do  not  suggest  any  novel 
approach  to  this  problem,  nor  do  we  consider  any  methods  to  improve 
existing  means  for  storing  and  retrieving  documents.  Our  concern  is 
with  information,  and  we  contend  that  documents  do  not  constitute 
information.  In  fact,  quite  a  few  documents  do  not  contain  any 
information,  in  spite  of  their  eye-catching  titles  and  summaries.  Hence 
no  sorting,  classifying,  and  labeling  of  such  documents  will  create  an 
information  system.  For  this  reason,  besides  the  others,  Baxendale's  [l]* 
counting  of  "non-general  words"  and  Borko's  and  Bernick's  [2]  linear 
regression  formula  can  yield  only  a  collection  of  documents  that  is 
hardly  relevant  to  one's  request  for  information.  In  many  cases  there 
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is,  indeed,  a  very  low  correlation  between  the  frequency  of  certain  words 
in  a  document  and  the  information  contained  or  not  contained  in  it,  or 
between  Luhn's  ill -defined  notions  [3]  with  their  statistics  and  what 
is  really  said  in  the  document.  You  may  find  a  paper  that  mentions 
electron-volt  more  than  a  hundred  times  and  yet  says  nothing  about  its 
relation  to  engineering  units  of  energy.  Yet  another  document  may 
mention  electron-volt  only  once  and  right  there  may  express  it  in  BTU's. 

We  cannot  seoarate  information  and  intelligence.  Since  neither  a 
file  nor  an  electronic  system  can  be  intelligent,  such  devices  cannot 
automatically  retrieve  desired  information  from  a  heap  of  relevant  and 
irrelevant  documents.  Nor  can  a  clerk  fare  any  better  when  he  encounters 
technical  language  completely  foreign  to  him.  Consequently,  we  need  a 

professional  designer  to  extract  information  from  its  source  and  to 

\ 

present  it  to  the  veer  by  the  most  efficient  means,  most  efficient  over 
a  span  of  time  and  a  number  of  requestors. 

Here  the  meaning  of  "most  efficient"  is  r.ot  with  respect  to 
available  means  but  with  respect  co  the  goal  of  information  retrieval. 

We  did  not  survey  all  the  forms  of  sea-going  ship-  and  all  the  shapes 
of  winged  aircraft  in  a  search  for  a  conveyance  to  carry  us  to  the  moon. 
Instead,  we  looked  first  for  what  it  takes  to  go  there,  and  then  we 
examined  whether  it  was  feasible  and  how  much  it  might  cost.  The  same 
approach  is  proposed  here  to  the  problem  of  information  retrieval: 
consider  first  what  is  needed  to  achieve  the  goal,  and  afterwards 
examine  the  feasibility  of  any  proposed  approach.  Of  course,  there  is 
more  than  one  way  to  build  and  to  stage  a  spaceship.  We  do  not  claim 
that  the  approach  to  retrieval  of  information  discussed  in  this  report 
is  necessarily  the  best  or  even  one  of  the  better  ones.  However,  we 
feel  that  our  plan  of  attack  Is  one  of  the  right  ones. 

We  propose  first  to  define  information.  The  definition  must  be 
constructed  according  to  the  purpose  of  the  Information  system.  Such  a 
definition  is  presented  in  Secti.n  II.  It  is  followed  by  an  example  in 
Section  III  that  illustrates  the  abstract  def4  dtlon  in  concrete  terms. 
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Section  IV  attempts  tc  present  the  questions  that  summarize  various 
requests  for  information  on  a  subject.  Furthermore,  Section  IV  discusses 
ways  to  obtain  answers  to  these  questions.  We  hope  this  discussion 
provides  motivation  for  the  definition  of  information  presented  in 
Section  II  and  also  establishes  a  link  with  Section  V  which  presents 
principles  for  the  physical  realization  of  an  information  system. 

Section  VI  attempts  to  compare  the  effort  needed  to  construct  the 
proposed  system  with  the  need  for  efficient  retrieval  of  information. 

II.  INFORMATION 

Frequently  a  man  busy  with  his  research  task  would  like  to  obtain 
needed  facts  without  having  to  search  for  hours  through  various 
references.  However,  at  other  times  he  may  want  to  range  through  the 
field  of  his  specialty  without  looking  for  particular  information.  He 
may  just  want  to  wander  around  and  see  what  is  going  on  in  his  field 
of  research.  Hence,  even  if  specialized  and  efficient  information 
systems  were  developed  we  still  would  need  libraries.  It  seems  that 
two  types  of  libraries  would  be  desirable.  First  we  should  have 
conventional  libraries  servi  'g  users  in  their  neighborhood.  Here  you 
could  get  a  hard  copy  of  a  document  to  browse  through  in  the  familiar 
manner  that  you  learned  long  ago,  beginning  with  kiddies’  coloring  books. 
The  other  type  should  be  a  library  of  the  future,  one  that  does  noc 
exist  now  and,  most  likely,  is  not  feasible  yet. 

In  this  library  of  the  future  books,  magazines,  journals,  maps, 
reports,  charts,  graphs,  and  any  other  conceivable  types  of  documents 
should  be  put  on  seme  kind  of  ultramicrofiche,  all  of  which  would  be 
meunted  on  a  superprocessor  with  thousands  of  parallel  channels  that 
would  provide  a  simultaneous  and  immediate  read-out  of  many  thousands 
of  documents.  The  channels  should  be  connected  to  remote  access 
stations  at  which  documents  of  your  choice  would  be  displayed  on  screens 
and  also  copied  if  necessary. 
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This  central  library  should  contain  a  considerable  portion  of  the 
documents  in  all  fields  of  knowledge  that  are  published  or  accepted  for 
publication.  Of  :ourse,  not  all  documents  should  be  included.  Let  us 
recall  that  the  Library  of  Congress,  holding  probaoly  the  world’s  largest 
collection  of  documents,  selects  only  about  two  million  books  and  other 
items  out  of  an  annual  supply  of  a  few  billion  pieces  that  reach  the 
Library.  The  library  of  the  future  should  be  no  less  selective.  It 
should  not  be  a  dump  for  every  piece  of  printed  matter. 

Immediate  access  to  the  material  in  this  library,  perhaps,  would 
allow  us  to  do  away  with  many  technical  journals.  Editors  of  these 
journals  could  become  a  part  of  the  library  team  that  selects  the 
documents  to  be  stored.  Then  a  researcher  in  any  field  would  not  have 
to  wait  one  to  three  years  to  read  what  new  thing  nas  been  found  by 
his  fellow  researcher  who  works  in  the  next-door  institution,  as  he  has 
to  wait  now  with  the  present  methods  of  refereeing,  editing, 
publishing,  and  distributing  of  technical  material. 

Present  methods  for  classifying  and  indexing  documents  would  hardly 
be  adequate  for  such  a  superlibrary  of  the  future.  Therefore  an 
adequate  number  of  remote  access  stations  should  be  rigged  to  record  the 
steps  taken  by  users  in  an  attempt  to  retrieve  documents  of  a  desired 
type.  One  should  record  both  the  ingenuity  and  the  frustration  of 
inulvidual  users.  These  data  should  be  collected  by  field  of  knowledge, 
type  of  documents,  and  by  class  of  users.  Careful  and  continuous 
analysis  of  such  data  would  allow  a  continuous  improvement  of  the 
software  and  the  hardware  of  ti»e  system.  It  also  would  serve  as 
guidance  in  the  future  selection  of  documents  for  storage  in  the 
library.  Obviously,  no  system  can  add  new  material  forever  without 
removing  something  that  is  already  in  the  system.  Collected  data  that 
would  show  what  is  being  used  and  how  would  also  provide  criteria  for 
removal  of  material. 

•Similar  data  collected  for  any  existing  document  retrieval  system 
would  reveal  it.;  weal:  and  strong  points  nnd  at  the  same  time  would  help 
find  ways,  for  improvement.  An  effort  spent  in  'oliecting  such  data 


may  be  very  worthwhile.  There  is  no  way  to  study  reality  other  than  to 
conduct  actual  observations  of  facts.  This  includes  the  operation  of  a 
document  retrieval  system,  as  well. 

We  are  so  accustomed  to  books,  journals,  and  conventional  libraries 
that  it  seems  absolutely  impossible  to  do  away  with  them.  This  report 
is  primarily  concerned  with  information,  not  documents.  Nevertheless 
the  preceding  paragraphs  about  libraries  were  included  simply  to  express 
our  prejudice  against  spoon-fed  information.  Seemingly,  many  researchers 
still  feel  -  at  least  subconsciously  -  that  they  should  read  for 
themselves  everything  published  in  their  field  just  as  researchers  felt 
fifty  or  a  hundred  years  ago.  Or  at  least  they  feel  that  they  should 
be  able  to  read  anything  they  choose,  including  trivial  verbiage  published 
in  their  field  of  knowledge.  However,  this  may  not  be  the  most 
favorable  condition  for  the  advancement  of  research  nor  the  most 
advantagec -s  situation  for  an  individual  researcher.  It  is  quite 
possible  that  sufficiently  diversified  and  adequately  complete 
information  systems  containing  only  significant  results  -  significant  by 
the  judgment  of  selected  groups  of  scientists  -  would  satisfy  the  needs 
of  a  research  community,  and  hence  there  would  be  no  need  for  a 
superlibrary  as  described  above.  Only  because  of  the  pre0udice  of  the 
author  for  libraries  and  because  of  his  do-it-yourself  attitude  wes  the 
above  digression  made. 

This  digression  may  also  serve  to  emphasize  the  contrast  between 
an  all-inclusive  (or  almost  all-)  library  and  a  specialized  information 
system  that  should  provide  up-to-date  knowledge  accessible  to  a 
specialist  only  and  that  could  help  him  to  plan  further  steps  into  the 
unknown  and  also  would  reduce  duplications.  In  other  words,  an 
information  system  should  be  highly  specialized,  and  it  should  be  able 
to  respond  to  the  queries  put  forth  by  a  specialist  in  the  field.  Indeed, 
there  is  no  point  in  building  an  information  system  on,  say,  partially 
ordered  spaces  that  would  provide  adequate  answers  to  questions  asked  by 
a  lawyer  who  never  learned  his  high  school  algebra.  Only  a  specialized 
information  system  designed  for  a  specialist  can  be  useful  In  research. 

.U 


Examples  of  specialized  systems  that  are  concerned  with  information 
and  not  with  documents  are  the  original  Chemical  Information  and  Data 
System  f"4]  and  American  Airlines’  SABRE  Electronic  Reservation  System 
[5]-  Differences  in  the  structure  and  in  the  methods  of  implementation 
of  these  two  systems  indicate  that  an  information  system  depends  on  the 
type  of  information  involved  and  on  the  purpose  of  retrieval.  Most 
likely,  no  single  description  of  an  information  system  could  cover  all 
possible  areas.  Each  field  of  knowledge,  or  at  least  each  family  of 
related  branches  of  science,  may  need  c  different  definition  of 
information  and  also  different  methods  for  storage  and  retrieval. 
Development  of  several  information  systems  may  provide  materia]  for 
study  of  common  principles  and  may  lead  to  discovery  of  more  efficient 
methods  of  retrieval. 

Branches  of  mathematics  are  the  best  organized  logical  systems. 
Therefore  it  seems  that  information  in  mathematical  disciplines  may  be 
easier  to  study  and  a  desired  mode  of  retrieval  may  be  simpler  to 
describe  than  in  other  fields  of  knowledge  such  as,  for  instance, 
political  science  or  philosophy,  where  an  appeal  to  the  emotion  of  the 
reader  by  the  rhet^ical  garb  of  presentation  frequently  outweighs  the 
logic  and  the  reasoning  of  the  argument.  Consequently,  we  choose  the 
field  of  mathematics  for  our  study  of  the  principles  of  information. 

A  mathematical  discipline  consists  of  undefined  terms,  among  many 
other  things.  Some  of  these  terms  are  primitive  concepts  such  as  lines 
or  points  in  Euclidean  geometiy.  Other  concepts  are  primitive  relations. 
For  instance,  ,:every  pair  of  distinct  points  defines  a  single  line"  is 
an  'ndefined  relation  between  a  pair  of  points  and  a  line  in  a  Euclidean 
space. 

We  start  with  the  net  of  all  such  undefined  terms  in  a  given 
system.  This  set  ueed  not  and  should  not  be  as  small  as  logically 
possible,  i.e.  it  should  contain  terms  that  cannot  be  defined  as  w^_Ll 
as  terms  that  are  commonly  known  to  a  specialist  in  the  field  and  hence 
need  not  be  defined.  A  person  or  a  group  of  persons  that  would 
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undertake  the  development  of  an  information  system  should  decide  which 
terms  are  "commonly  known"  and  which  are  not.  For  short  we  will  call 
such  a  person  or  a  group  a  designer. 

The  designer  should  also  choose  what  to  include  in  the  system  and 
what  not  to.  New  developments  in  the  field  should  be  screened  by  him 
and  either  included  in  the  information  system  or  rejected  as  trivial, 
irrelevant,  or  not  new.  This  is  an  arbitrary  procedure.  It  is  possible 
that  some  gems  of  discovery  will  never  appear  in  the  system  and  some 
trivia  will  be  included.  Well,  nothing  is  perfect.  Nor  should  one 
strive  to  construct  a  perfect  information  system.  It  seems  that  the 
most  desirable  information  retrieval  system  would  be  an  intelligent 
expert  who  possesses  the  experience  of  several  scholars  and  who  can 
read,  screen,  and  evaluate  as  much  new  material  as  a  score  of  diligent 
students  combined.  However,  such  a  superexpert  still  would  have  his 
prejudices,  likes,  and  dislikes,  his  oversights,  and  exaggerations.  Yet 
we  would  be  happy  to  have  such  a  consultant,  especially  if  the  only 
credit  required  for  Ms  service  would  be  what  a  researcher  now  gives  to 
the  library. 

We  imagine  our  information  system  as  a  sort  of  a  superhandbook 
that  is  up-to-date,  complete,  and  so  well  organized  that  any  information 
contained  in  it  can  be  retrieved  with  great  ease.  At  present  we  have 
competing  handbooks  and  monographs  in  many  fields  of  knowledge.  There 
is  no  reason  why  competing  information  systems  should  not  be  constructed. 
Similar  competing  commercial  information  systems  are  visualized  in  the 
future.  Some  of  them  sure  described  in  [6].  Competition  would 
encourage  us  to  design  information  systems  that  suit  the  needs  of  users 
instead  of  merely  the  taste  of  the  designer. 

After  these  remarks  we  return  to  our  definition  of  an  information 
system.  As  we  choose  the  undefined  terms  to  be  included  in  our 
information,  we  should  also  consider  synonyms.  Thus,  we  introduce 
elements  that  consist  of  two  components:  a  main  term  and  the  set  of 
its  synonyms.  Furthermore  we  introduce  a  label  for  each  clement.  We 
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call  this  label  the  "address  of  the  element".  It  consists  of  three 
components:  (d,o,i).  The  symbol  i  is  defined  below.  We  abbreviate  this 
to  d°.  This  symbol  will  be  used  with  two  different  meanings:  the 
address  of  an  element  and  the  element  itself,  i.e.  we  write  d°  »  (d°,(&°)), 
where  d°  is  the  main  term  and  (d°)  is  the  set  of  synonyms.  Thus,  an 
element  d^  of  our  system  that  corresponds  to  an  undefined  notion  or  an 
undefined  relation  has  two  components:  the  first  component  is  the  main 
term,  and  the  second  component  is  the  set  of  synonyms,  which  may  be 
empty.  We  "’all  the  elements  of  (d°)  primary  synonyms.  The  collection 
of  all  such  pairs  (together  with  their  addresses)  is  denoted  by  J).  This 
collection  we  call  information  of  level  zero.  We  will  also  use  the 
symbol  I  =  qD. 

The  first  two  components  of  an  address  are  d  and  o  for  every 

fi. 

element  of  D.  The  third  component,  i,  is  a  real  number  of  the  form  — , 

^  n  2^ 

where  a  is  a  positive  integer  and  a  <  2.  We  form  an  alphabetic  list 

of  all  the  main  terms  and  assign  a  third  component  to  each  element  in 

such  a  way  that  these  components  form  a  monotone  strictly  increasing 

sequence.  The  alphabetic  list  of  main  terms  together  with  their 

addresses  is  called  the  qD -diet ionary. 

Besides  the  set  (d°)  of  primary  synonyms  we  also  consider  the  set 

(d?)  of  secondary  synonyms.  This  set  may  be  empty  or  not.  It  contains 

some  rare  or  old  synonyms  such  as,  for  Instance,  "analysis  situs"  for 

topology  and  some  likely  misspellings  of  the  main  term  or  primary 

synonyms.  We  form  an  alphabetic  list  that  includes  all  main  terms, 

primary  and  secondary  synonyms,  each  labeled  with  the  respective 

address.  We  call  this  list  the  °D-d..etionary  (or  a  Dq -diet ionary ) .  As 

stated  above  the  set  D  (or  I  )  is  an  information  of  level  zero.  The 

o  o 

set  I  together  with  qD  -  and  °D-dictionaries,  with  some  means  of 
recording  the  elements  and  with  the  rules  _or  adding  new  elements,  for 
deleting  some  of  them,  and  for  retrieving  constitutes  an  infor.  t,ion 
•ystem  of  level  zero.  The  operation  rule  that  provides  the  address  of 
an  element  when  the  main  term  or  any  of  its  synonyms  is  specified  is 

14 


called  the  element  translator.  This  rule  may  be  simply  a  look-up  in 
the  °D -dictionary. 

We  define  information  of  level  n  inductively.  Suppose  that 

information  I  ,  of  level  n-1  lias  been  defined.  We  assume  that 
n-1 

I  ,  =  D  ..UP  _UC  .US  ,UT  t,  the  logical  sum  of  five  sets  to  be 
n-.l  n-1  n-1  n-1  n-1  n-1* 

described  below.  For  n  =  1  we  set  D  ,  =•  D  =  D,  P  =  C  =  S  =  T  =  & 

n-1  o  o  '  o  o  o  o  r 

(the  empty  set).  Let  [d^]  be  a  verbal  statement  of  a  definition  of  a 

concept  d^  expressed  in  terms  of  words  contained  in  the  ^-dictionary 
or  in  terms  of  their  semantic  equivalents  (such  as  plural  or  possessive 
form,  past  tense,  etc.)  together  with  logical  connectives,  such  as  "or", 
and  quantifiers  such  as  "there  exist",  and  also  with  any  non-technical 
terms  (non-technical  with  respect  to  the  particular  field  of  information) 
ouch  as  "of",  "for",  etc.  In  other  .  rds  [d^]  is  a  definition  of  the 

term  d°  freely  chosen  by  the  designer  of  the  information  system  with  the 
single  restriction  that  this  definition  must  not  use  any  technical 
terms  (no  matter  how  common  and  simple)  excepi.  those  contained  in  the 
^n-l”dicti°nary. 

We  assume  that  every  definition  consists  of  a  generic  term  t  and  a 
qualifying  statement.  The  term  t  and  any  technical  term  used  in  a 
qualifying  statement  are  in  the  Dn_1-dictionary.  Hence  each  such  term 
is  associated  with  a  corresponding  address  dj,  k  <|>.  We  denote  by  nDi 

the  set  whose  single  element  is  the  address  of  t  and  by  the  set  of 
addresses  of  all  the  terms  contained  in  the  qualifying  statement.  We 
also  choose  the  set  (d^)  of  primary  synonyms  of  the  term  d^  and  the  set 

(dj )  of  its  secondary  synonyms.  Next  we  choose  a  document  reference 
which,  in  our  judgment,  contains  satisfactory  discussion  of  all  or  some 
of  the  following  aspects:  motivation  of  the  definition  and  of  the  choice 
of  the  term,  a  proof  that  the  definition  is  non-vacuous  and  meaningful, 
and  a  list  of  other  pertinent  references.  We  denote  this  reference  by 
{d”},  i.e.  {d£}  consists  of  an  author's  name,  the  title  of  his  work, 
page,  date,  etc.  Finally,  we  assign  an  integer  3^  to  the  term  d"  and 
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call  that  integer  an  index  of  rescission.  This  is  a  control  parameter 

for  removing  elements  by  the  rules  described  below.  Thus,  we  construct 

a  septemplet  (d^,  (d^),  nD^,  D^,  [d^],  {d^j,  d^).  We  assign  an  address 

to  this  element  as  follows:  the  first  component  is  d,  the  second 

component  is  obtained  by  adding  one  to  the  maximum  of  the  second 

component  of  all  addresses  contained  in  nDi  and  D^;  the  third  component 

i  is  assigned  in  a  similar  fashion  as  for  the  elements  of  D.  The 

o 

collection  of  all  these  septemplets,  together  with  their  addresses, 
where  the  second  component  of  the  address  is  a  fixed  value  n,  is  the  set 
D.  The  alphabetical  list  of  all  the  terms  d^  together  with  their 
addresses  is  the  D-dictionary.  Te  also  have  the  ^-dictionary  that 
includes  the  main  terms  and  all  their  synonyms  along  with  the 
address  of  each.  Of  course,  we  assume  again  that  a  translator,  i.e. 
a  rule  for  obtaining  the  address  of  any  main  term  or  its  synonym,  is 
available. 

Besides  definitions,  a  discipline  in  mathematics  contains 
postulates,  conjectures,  and  theorems.  The  logical  form  of  a  theorem 
is  an  implication,  i.e.  the  main  logical  connective  in  a  theorem  is 
"if"  or  "if  and  only  if".  We  call  the  first  type  of  theorem  unsymmetric 
or  of  the  t-type,  and  the  second  type  of  theorem  symmetric  or  of  the 
s-type.  Obviously,  existence  theorems  such  as  "there  exists  a  unique 
solution  of  Lu  =  0"  can  readily  be  expressed  in  a  form  of  implication 
such  as:  "If  L  is  an  operator  with  the  property  P,  then  there  exists 
a  unique  element  u  such  that  Lu  =  0".  Similarly,  postulates  and 
conjectures  can  be  expressed  in  a  form  of  implication.  Tims,  for 
instance,  the  Peano’s  axiom’  "One  is  a  nr 'ural  number"  can  be  stated  as 
"If  an  element  is  one,  then  it  belongs  to  the  set  of  natural  numbers". 

We  define  the  elements  of  information  of  level  n  that  correspond 
to  syiauetric  theorems  in  a  fashion  very  similar  to  that  used  in  our 
definition  of  elements  in  the  set  ^D.  Namely,  we  denote  by  [s^1]  a 
verbal  statement  of  a  symmetric  theorem.  The  symbol  s ^  is  an 
identification  symbol  such  as  "Cauchy  Theorem"  etc.  If  the  theorem  has 
ik'  name,  the  symbol  is  simply  its  address  in  the  information.  The 
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sets  (sj)  and  (s^)  are  the  sets  of  primary  and  secondary  synonyms.  The 
main  connective  in  a  theorem  divides  it  into  two  parts.  We  choose  one 
of  these  parts  as  the  antecedent  (it  is  a  symmetric  theorem)  and  the 
other  as  the  consequent.  The  addresses  of  the  terms  contained  in  the 
antecedent  constitute  the  set  nS^,  and  those  belonging  to  the  terms  in 

the  consequent  belong  to  the  set  S^.  Of  course,  these  are  the  addresses 

of  terms  in  the  Dn  ^-dictionary.  Again  we  choose  a  reference  { s'j1}  and 

a  rescission  index  "s in  a  fashion  similar  to  that  used  in  the  case  of 

elements  of  nD.  Thus,  we  obtain  =  (s^,  (s^),  nS.£,  S^,  [sf],  {s^},  s^). 

The  collection  of  an.  such  elements  in  the  information  is  denoted  by  ^S. 


Similarly,  we  define  the  set  qT  of  unsymmetric  theorems,  the  set  rP 
of  postulates,  and  the  set  of  conjectures.  Elements  of  these  sets 
consist  of  seven  components  each  and  have  addresses  of  the  type  t^,  p?, 

and  c°,  respectively.  We  also  construct  an  ^-dictionary  and  an 

nS-dictionary,  similar  to  the  qD-  and  the  °D-dictionaries .  Similarly, 

we  have  dictionaries  for  T,  P,  and  C.  Now  the  information  of  level 

n  n  n 

n 

n  is  I  -I  ,U  DU  SU  HJ  RJ  C.  We  also  define  the  sets  =  U  ,D, 
n  n-1  n  n  n  n  n  n  k=ok 

r  n  n  n 


S  =  U  .  S,  T 
n  .  k  n 
k=o 


“  kU  V 

k=o 


U  P,  ,  and  C  =  U  ,  C. 


k=o 


k=o 


Frequently  we  will  discuss  the  elements  of  information  without 

R  k  k 

specifying  their  type.  Hence  we  introduce  the  notation  a^,  (a^, 
fcA,  A^,  etc  with  a  denoting  either  d(definition) ,  or  s( symmetric 
theorem),  or  t,  or,  p,  or  c.  Similarly,  A  stands  for  any  of  the  five 

I 

capital  letters.  D,  S,  T,  P,  C.  Similarly,  bj  etc.  denote  generic 

elements  of  the  information.  We  also  assume  that  we  have  A  -dictionaries 

n 

that  are  alphabetical  lists  of  main  terms  and  all  the  synonyms  contained 

in  the  elements  of  A  together  with  their  addresses.  We  also  construct 

n 

an  I^-dlct ionary  that  is  obtained  by  combining  A^ -dictionaries  for  all 
five  values  of  A.  Here  N  denotes  the  highest  level  of  the  elements 
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contained  in  the  information.  We  call  this  I ^-dictionary  the  main 
dictionary  of  the  information. 


The  structure  of  the  elements  of  information  induces  certain 

fa 

relations.  We  say  that  a^  and  bj  are  in  a  relation  of  order  zero  to 

each  otl  ?r  if  there  exists  an  element  am  such  that  a^,  bf  e  mA_UAnl.  The 

z  i'  J  z  z 

element  a^‘  is  in  a  left  relation  of  order  one  to  the  element  a^,  provided 

1c  in  ni  ]c 

a.  e  A  ;  a  is  in  a  relation  of  order  one  to  the  element  a  ,  provided 

2.  Z  Z  1 

ai  e  ^z^z*  ^  az  is  a  relation  of  order  one  to  a^,  then 

k  w 

we  say  that  a  is  in  a  (left)  relation  of  order  minus  one  to  a  .  The 
1  z 

verbal  statement  [am]  is  called  a  natural  relation  between  a^  and  a™ 

(of  order  one  or  order  minus  one),  and  also  it  is  called  a  natural 

k  l 

relation  of  order  zero  between  a^  and  b^. 


Besides  these  natural  relations  of  order  minus  one,  zero,  or  one, 

we  define  relations  of  higher  (positive  or  negative)  order  inductively 

as  follows.  Suppose  that  a.^  is  in  a  relation  of  order  to  the 

mm  t 

element  and  that  a^  is  in  a  relation  of  order  n0  to  the  element  b,. 

k  i 

Then  we  say  that  a^  is  in  a  relation  of  order  n^  +  n^  to  b^  provided 

n.,np  >  0. 


A  finite  sequence  of  elements  i  ,1^,...,/^  is  called  a  K-chain 
of  elements  provided  each  element  of  the  sequence  (except  the  last  one) 
is  in  a  relation  of  order  one  to  the  next  element.  Similarly,  she 
sequence  i  ,f  is  called  a  minus  k-chain  if  every  element  of  the 

sequence  (except  the  last  one)  is  in  a  relation  of  order  minus  one  to 
its  neighbor  on  the  right.  The  sequence  of  natural  relations  of  order 
one  (or  minus  one,  respectively)  between  the  successive  elements  of  a 
k-chain  (minus  k-chain)  of  elements  is  called  a  k-chain  (minus  k-chain) 
It  is  easy  to  prove  the  following  theore  .. 


IS 


of  relations. 


Suppose  £  is  in  a  relation  of  order  k(minus  k)  to  £,.  Then  there 

O  K. 

exists  a  k-chain  (minus  k-chain)  of  elements  that  starts  with  £q  and 

ends  with  £,  . 

k 

We  say  that  this  is  a  k-chain  (or  minus  k-chain)  of  elements  from 

l  to  £  .  Similarly,  the  corresponding  k-chain  of  relations  is  a 
o  ic 

k-chain  of  relations  from  £  to  i,  . 

o  k 


In  Sections  IV  and  V  an  information  retrieval  process  is  expressed 
as  a  display  of  components  of  elements  and  k-chains  of  elements  or 
relations.  The  same  process  could  be  expressed  in  terms  of  operations 
on  certain  matrices.  One  can  construct  a  matrix  M  that  represents  the 
structure  of  the  information  system  (including  the  relations).  For 
every  n=0,l,2, . . . ,N  we  put  A-dictionaries  one  after  the  other  in  the 
sequence  D,  P,  S,  T,  and  C  to  form  an  n-list  of  elements.  Then  we 
put  the  n-lists  in  a  sequence:  n=0,l,2, . . . ,N  and  number  the  elements 


successively  from  1  to  card  I„.  Now  we  define  the  elements  of  a  matrix 
M  as  follows:  m^  -  1  if  d  3£  i  and  the  i-th  element  of  1^  is  in  a 
relation  of  order  minus  one  to  the  j-th  element.  Otherwise  tn^  =  0  for 
J  2  i.  The  elements  m^  for  J  <  i  are  defined  by  the  relation 


i.e.  M  is  skew-symmetric . 


This  definition  of  M  can  be  used  to  compare  our  information  system 

with  the  analysis  presented  in  [7].  Since  M  is  skew-symmetric  it  is 

sufficient  to  consider  only  its  upper-triangular  part.  This  part  can  be 

partitioned  into  rectangular  subnatrices  that  represent  structural 

relations  of  *  1^  \  I^_1  (the  set  theoretical  difference  of  1^  and 

with  Since  the  elements  of  are  related  only  to  the 

elements  of  the  form  d®,  one  can  omit  the  rows  of  every  matrix  obtained 

by  partitioning  that  correspond  to  the  elements  of  L  .  other  than  d®. 

a-1  J 

We  denote  the  resulting  matrix  by  Analysis  of  these  matrices  may 
be  employed  to  determine  conrctedness  of  the  information,  maximum 
length  of  k-chains  between  two  given  elements,  and  many  other  questions 
about  the  structure  of  the  information.  These  problems  are  interesting 
from  an  abstract  view  point.  However,  when  passing  from  1^  to  the  matrix 
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M  we  lose  the  most  important  aspects  of  1^,  namely  the  components 

[a^]  of  the  elements  in  1^.  Without  these  verbal  statements  the  system 
loses  its  practical  value  as  an  information  retrieval  system. 

In  summary,  an  information  system  consists  of  elements  that  are 
septemplets  whose  successive  components  are:  name ( identification ) ,  set 
of  primary  synonyms,  set  of  elements  in  antecedant  (or  generic  term), 
set  of  elements  in  consequent  (or  qualifying  statement),  verbal 
description  of  the  element,  bibliographical  reference,  and  rescission 
index.  These  septemplets  are  provided  with  labels  or  addresses  and  are 
grouped  into  sets  that  constitute  information  of  various  levels, 
beginning  with  level  zero  up  to  level  N.  The  highest  level  N  may 
increase  or  decrease  with  the  life  of  the  system.  This  is  discussed 
in  Section  V.  The  elements  of  information  of  level  zero  are  pairs 
instead  of  septemplets.  Furthermore,  relations  of  order 
k(k  =  -  N;  -  N  +  1, ... ,0,1,2, ... ,N)  among  the  elements  of  1^  and  k-chains 
of  elements  and  of  natural  relations  are  defined.  Furthermore,  we 
construct  various  dictionaries. 

This  definition  of  information  was  chosen  with  a  discipline  in 
mathematics  in  mind.  It  is  certainly  not  universally  applicable. 

However,  this  definition  may  be  suitable  for  many  other  technical  and 
scientific  disciplines.  After  all,  one  can  say  the  same  about  many 
research  fields  as  is  said  on  p.  186  of  [8]:  "Economic  theory,  not 
unlike  other  theories,  consists  of  three  basic  elements  -  definitions, 
assumptions,  and  conclusions".  Thus,  in  many  fields  observed  facts, 
data,  arid  commonly  used  technical  terms  can  be  assigned  to  IQ,  i.e.  can 
be  treated  as  information  of  level  zero.  Definitions  can  be  included 
In  the  sets  in  the  manner  described  above.  Empirical  relations 
established  between  observed  facts  and  collected  data  constitute 
postulates  of  various  levels.  Hypotheses  are  conjectures.  Theory  and 
Its  deductive  conclusions  are  equivalent  to  theorems;  in  mathematics. 
Queries  for  information  also  can  be  interpreted  in  terms  of  the  types 
cf  questions  in  mathematics  as  presented  in  Section  IV  below. 


III.  EXAMPLE 


We  illustrate  the  definition  developed  in  the  preceding  section  by 
an  example  that  has  been  constructed  from  the  discussion  contained  in 
Section  I,  Chapter  1,  in  [9]*  The  set  qD  of  this  example  is  rather 
small.  It  contains  only  29  elements,  and  these  are  listed  in  Table  I. 
This  Table  is  our  QD-dictionary.  Obviously,  in  a  specialized  information 

Table  I 

d°  =  addition, 

d£  =  cardinal  number, 

d£  =  cartesian  product, 

d£  *  collection, 

d«  *  countable , 

d£  *  difference  of  sets, 

d$  *  element  of, 

d£  *  empty  set, 

d£  -  equal, 

d°0  »  finite , 

dfi  *  first  element  of  the  ordered  pair, 
d°a  *  identity  mapping  of  a  set  A  onto  set  A, 
d£a  ■  indexing  set, 
d*4  '  integer, 

4?s  -  intersection, 

d£t  -  minimum, 

d!*?  •  not  an  element  of, 

»  not  equal, 
df»  *  one -one , 
d£0  *  ordered  pair, 

dax  ■  parentheses  (), 
d£a  *  proper  subset, 

dgt  -  second  element  of  the  ordered  pair, 
d&4  ■  sequence, 
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Table  I  (Continued) 


das  =  set, 

d£e  =  set  of  all, 

d§7  =  set  of  positive  integers, 

dge  =  subset, 

<£»  =  union. 

system  terms  like  single -valued  mapping,  range,  and  binary  operation 
need  not  be  defined,  i.e.  they  should  be  included  in  qD.  Instead,  we 
have  chosen  to  consider  these  terms  anu  many  other  commonly  known 
concepts  as  elements  of  informations  of  higher  level  since  this  example 
is  intended  only  to  illustrate  the  principles  outlined  earlier. 

With  this  choice  of  D  =  I  the  total  information  contained  in 

o  o 

Section  I  of  [9]  extends  up  to  1^,.  The  sets  and  are  empty 
(there  are  no  postulates  or  conjectures  in  this  section).  The  sets 
Sg  and  Tg  are  also  empty.  Sane  sets  such  as  ^D,  ^D,  qD,  and  others 
contain  only  one  element  each.  The  next  largest  set  after  qD  Is  <-D 
with  twelve  elements.  Table  II  can  be  interpreted  as  a  collection  of 
^-dictionaries,  k=l,2, . . . ,11.  This  Table  contains  the  main  terms  of 
e*.eh  set  in  alphabetical  order.  The  left  column  contains  the  addresses 
of  the  corresponding  elements.  For  simplicity  the  subscripts  i  in  these 
addresses  are  written  with  a  common  denominator  (power  of  2)  that  is 
omitted.  Thus,  dj  should  really  be  d^,  and  d£  should  he  d^.  Similar 

simplification  is  adopted  in  Tab..  I  and  also  in  other  notation  of  this 
example.  Table  III  contains  the  sets  (d^  of  primary  synonyms  that  are 
not  empty.  In  this  example  all  the  sets  of  secondary  synonyms  are 
empty . 
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Table  II 


Single  valued  mapping 

Binary  operation 
Inverse  mapping 

Range  of  a  single  valued  mapping 
Halfgroupoid 

Compatible  collection  of  halfgroupoids 
Disjoint  collection  of  halfgroupoids 
Divisor  in  a  halfgroupoid 
Groupoid 

Homomorphism  of  a  halfgroupoid  into  a  he 1  ■‘’groupoid 
Order  of  a  halfgroupoid 
Subhalf groupoid 

Countable  halfgroupoid 
Divisor  chain  in  a  halfgroupoid 
Endomorphism  of  a  halfgroupoid 
Extension  of  a  halfgroupoid 
Finite  halfgroupoid 
Homomorphism  onto 

Intersecting  collection  of  halfgroupoids 

Isomorphism 

Prime  element 

Subgroupoid 

Subbalf groupo id  closed  in  the  halfgroupoid 
Union  cf  a  compatible  collection  of  halfgroupoids 

Automorphism 

Complete  extension  of  a  halfgroupoid 

Disjoint  collection  of  groupo ids  with  amalgamated  subgroupoids 
Extension  chain  of  halfgroupoids 


Table  II  (Continued) 


d|  Finite  divisor  cnain  over  a  halfgroupoid 

Imbedduole  halfgroupoid 
d^  Induced  homomorphism 

df  Intersection  of  an  intersecting  collection  of  halfgroupoids 

d|  Open  extension  of  a  halfgroupoid 

df  Complete  extension  chain  of  halfgroupoids 

df  Halfgroupoid  free  over  its  subhalfgroupoid 

dg  Le  gth  of  a  finite  divisor  chain 

df  Maximal  extension  chain 

df  Open  extension  chain 

df  Subhalfgroupoid  generates  a  subhalfgroupoid 

df  Freely  generated  halfgroupoid 

df°  Free  basis  f  a  halfgroupoid 

dg0  Free  product  of  a  disjoint  collection  of  halfgroupoids 

d£°  Generalized  free  product  of  a  compatible  collection  of 

halfgroupoids 

df1  Free  halfgroupoid 

d!1  Generalized  free  product  of  a  disjoint  collection  of 

groupoids  with  amalgamated  subgroupoids 

Table  III 

(d|)  =  (Imbedded  halfgroupoid,  Imbedding  of  a  halfgroupoid), 

(df )  =  (Homomorphism  extends  a  homomorphism,  Extension  of  homomorphism), 

(df )  =  (Subhalfgroupoid  of  halfgroupoid  generated  by  a 

subhalfgroupoid) , 

(df)  -  (Halfgroupoid  freely  generates  a  halfgroupoid). 

Table  IV  lists  the  verbal  statements  [d^]. 


2b 


Table  IV 


[dll  =  Single  valued  mapping  f  of  a  set  A  into  a  set  B  (f,A,B)  is  a 
subset  of  cartesian  product  A  x  B  such  that  (a,b),  (a,c)  e  f 
implies  b  =  c. 

[dt ]  =  Binary  operation  f  on  a  set  G  is  a  single  valued  mapping 

(f,  G  X  G,  G). 

[df  ]  =  Inverse  mapping  of  a  single  valued  mapping  (f,A,B)  is  a  single 

valued  mapping  (f-1 ,B,A)  such  that  (b,a)  e  f-*  if  and  only  if 
(a,b)  e  f. 

[d|]  =  Range  of  a  single  valued  mapping  (f,A,B)  is  a  subset  R(f)  of  A 

such  that  a  e  R(f)  implies  that  there  exist  b  e  B  and 
(a,b)  e  f. 

[d?]  =  Halfgroupoid  ^  is  an  ordered  pair  (j,f)  whose  first  element 

is  a  set  J  and  the  second  element  is  a  binary  operation  f  on 
the  set  J. 

[dt  ]  =  Compatible  collection  of  halfgroupoids  {(H  ,f  )|ar  e  A}  is  a 

set  of  halfgroupoids  (H^,f  )  with  a  e  A  where  A  is  an 
indexing  set  such  that  a,p  e  A  and  (a,b)  e  R(f^)  D  R(fg)  imply 
that  ftf(a,b)  =  fp(a,b). 

[d|]  =  Disjoint  collection  of  halfgroupoids  {(H^f^JIa  e  A}  is  a  set 

of  halfgroupoids  (H^f^)  with  a  e  A  where  A  is  an  indexing  set 

such  that  a,P  e  A  and  H  D  HQ  t  <j>  implies  that  or  =  p. 

or  p 

=  If  a,  h,  c  e  K  then  a  and  b  are  divisors  of  c  in  a  halfgroupoid 
(H,f)  provided  c  =  f(a,b). 

[<]  =  Groupoid  p  is  a  halfgroupoid  (j,f)  such  that  R(f)  =  J  x  J. 

[d^]  =  Homomorphism  of  3C  =  (H,f)  into  $  =  (G,g)  is  a  single 

valued  mapping  (9,H,G)  such  that  a,  b,  c  e  H  and  c  =  f(a,b) 
imply  9(c)  =  g(9(a),9(b)). 

[4]  =  Order  of  a  halfgroupoid  (H,f)  is  the  cardinal  number  of  H. 

[d£]  =  Let  JC  =  (H.h)  be  a  halfgroupoid  and  let  J  c  H.  iiiet 

g  =  {(a,b)|a,b  e  J  and  (a,b)  e  h}.  Then  £  =  (j,g)  is  a 
subhalfgroupoid  of  K. 
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Table  IV  (Continued) 


[df] 

[d|] 

C4] 

[d|] 

Cd|] 

[d|] 

[df] 

[d|] 

[all 

[dfol 

Cdfi] 

Cd?2] 

[d?] 

[d|] 


Halfgroupoid  is  countable  when  its  order  is  countable. 

Let  JC  =  (H,h)  be  a  halfgroupoid  and  (a^}  be  a  sequence  of 

elements  of  H.  If  i  e  N  implies  that  a.  is  divisor  of  a. 

1  i+1 

then  { a .  1  is  a  divisor  chain  in  JC. 

i" 

Endomorphism  of  a  halfgroupoid  JC  is  homomorphism  of  JC  into  JC. 

Extension  of  a  halfgroupoid  2  =  (J>g)  is  a  halfgroupoid 
JC  =  (H,h)  such  that  J  c  JC  and  R(h)  c  J  x  J. 

Halfgroupoid  is  finite  when  its  order  is  finite. 

Homomorphism  9  of  a  halfgroupoid  2  onto  a  halfgroupoid  JC  is  a 
homomorphism  of  2  into  JC  such  that  9(J)  =  H. 

Intersecting  collection  of  halfgroupoids  |5C  |ar  e  A}  is  a 
compatible  collection  ?  halfgroupoids  JC  =  (H  ,f  )  such  that 

n  h  i 

a  s  A 

Isomorphism  9  of  halfgroupoids  2  -  (J,g)  said  JC  =  (H,h)  is 
homomorphism  of  2  into  JC  that  is  a  one-to-one  single  valued 
mapping  of  J  onto  H. 

Let  JC  =  (H,h)  be  a  halfgroupoid  and  a  e  H.  Then  a  is  prime  if 
it  has  no  divisors  in  JC. 

Subgroupoid  of  a  halfgroupoid  JC  -  (H,h)  is  a  subhalfgroupoid 
2  =  (Jjg)  such  that  2  is  groupoid. 

Subhalfgroupoid  2  -  (<J,g)  of  halfgroupoid  JC  =  (H.h)  is  closed 
in  JC  if  a,b  e  J  and  (a,b)  e  R(h)  imply  that  (a,b;  e  R(g). 

Union  of  a  compatible  collection  of  halfgroupoids  )}  is 

a  halfgroupoid  2  =  (J>g)  such  that  J  =  UH^  and  g(a,b)  =  c  for 
a,  b,  c  e  J  iff  there  exists  a  such  that  h^(a,b)  =  c. 

Automorphism  of  a  halfgroupoid  JC  is  an  isomorphism  of  JC  and  JC. 

Complete  extension  of  a  halfgroupoid  (J,g)  is  an  extension 
(H,h)  such  that  J  x  jc  R(u). 
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Table  IV  (Continued) 


Disjoint  collection  of  groupoids  with  amalgamated 
subgroupoids  is  a  disjoint  collection  of  groupoids  [?a jar  e  A} 

with  a  set  of  groupoids  j^)  |of,P  e  A  and  ^  c 

such  that  and  2^  are  isomorphic  and  there  exist  a 

compatible-  collection  of  groupoids  {K^  =  (K  ,k  )}  such  that 

for  evexy  a  e  A  there  exist  an  isomorphism  cpff  of  ^  and  (K  >kff ) 

and  ^  n  Kr 

Extension  chain  of  halfgroupoids  is  a  sequence  fK^}  of 
halfgroupoids  such  that  i  e  N  implies  JCi+1  is  an  extension 
of  JC  . 

Divisor  chain  {a.^}  in  a  halfgroupoid  JC  is  finite  over  JC  if 
there  exists  an  integer  k  such  that  for  every  natural  number 

p,ak+p  =  V 

Halfgroupoid  ^  is  imbeddable  in  halfgroupoid  JC  if  there  exist 
a  subhalf groupoid  K  of  JC  isomorphic  with  p. 

Let  <p  be  a  homomorphism  of  a  halfgroupoid  ?  =  (j,g)  into  a 
halfgroupoid  JC  *  (H,h)  and  let  K  *  (K,k)  be  a  subhalfgroupoid 
of  Let  0  be  a  single  valued  mapping  from  K  into  H  such 
that  for  every  a  e  K,6(a)  =  cp(a) .  Then  8  is  a  homomorphism 
induced  by  tp. 

Intersection  of  intersecting  collection  of  halfgroupoids 
{(^,ha)|a  e  A}  is  a  halfgroupoid  P  «  (J,g)  such  that 

J  =  n  H  and  for  (a,b)  e  (j  x  j)  fl  R(h  ),g(a,b)  =  h  (a,b). 
or  e  A  “ 

Open  extension  of  a  halfgroupoid  (j,g)  is  an  extension  (H,h) 
such  that  (a,b)  c  R(h)  and  h(a,b)  e  J  implies  that  (a,b)  e  R(g) 
and  g(a,b)  -  h(a,b).  Also  (a’,b'),(a,  '  ,,  h(a,b)  i  R(g), 

and  h(a',b’)  *  h(a,b)  jointly  imply  that  «  a'  and  b  =  b' . 

Complete  extension  chain  of  halfgroupoids  {JCji  s  N}  is  an 
extension  chain  of  halfgroupoids  {JCji  e  N}  such  that  for 
every  i  JCi+1  is  a  complete  extension  of  J^. 
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Table  IV  (Continued) 


[an 


[an 


[an 


[an 


[an 


[a?] 
[di°] 
[a1,3  ] 

[4°  3 


[ai1] 

[41] 


=  Halfgroupold  K  is  free  over  its  subhalf groupoid  2  provided 
every  homomorphism  of  2  into  a  halfgroupoid  K  can  be 
extended  to  a  homomorphism  of  halfgroupoid  50  into  X. 

=  Length  of  a  finite  divisor  chain  (a^|k  e  N)  over  halfgroupoid 
2  is  the  integer  n  =  min  k  where  k  is  such  that  for  every 

p  e  *  Vp  ”  V 

=  Maximal  extension  chain  of  a  halfgroupoid  (Jig)  in  a  half¬ 
groupoid  (H,h)  is  an  extension  chain  e  N}  such  that 

(J0fSQ)  =  (Jig)j  and  i  e  N  implies  that  (l)  c  H  and  (2) 

for  x,y  e  Jigi+1(x,y)  =  z  whenever  h(x,y)  =  z. 

=  Open  extension  chain  of  halfgroupoids  is  an  extension  chain  of 
halfgroupoids  | i  e  n}  such  that  for  every  integer  i  3C^+^  is 

an  open  extension  of  3C . 

=  Subnalf groupoid  of  halfgroupoid  generated  by  a  subhalf groupoid. 
Let  3C  be  a  halfgroupoid  and  2  c  5C.  Let  {?±\l  e  N)  be  a 
maximal  extension  chain  of  ^  in  3C  and  let  K  =  U  2±  then  K  is 
subhalfgroupoid  of  3C  generated  by 

=  Halfgroupoid  3C  is  freely  generated  by  halfgroupoid  ?  if  ? 
generates  3C  and  3C  is  free  over 

=  Free  basis  of  a  halfgroupoid  (H,h)  is  a  subset  B  c  H  such 
that  B  freely  generates  (H,h). 

=  Free  product  of  a  disjoint  collection  of  halfgroupoids 
{3C  |a  e  A]  is  a  groupoid  2  freely  generated  by  U  X  . 
a  a  c  A  a 

=  Generalized  free  product  of  a  compatible  collection  of 

halfgroupoids  [3C  (or  s  A]  is  a  groupoid  2  freely  generated  by 


=  Halfgroupoid  is  free  if  it  has  a  free  basis. 

*  Generalized  free  product  of  disjoint  collection  of  groupoids 
with  amalgamated  subgroupoids  V  Ka,<Pal°r,P  s  A)  is  a 

generalized  free  product  of  compatible  collection  of 
groupoids:  K  =  fW  . 
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Our  wording  is  in  most  cases  not  an  exact  copy  of  that  in  [9]. 


Table  V 


df 

da  dSe 

4 

4 

dii  dgo  4a  4s 

df 

df 

4 

4 

jO  jO  jO 

all  a20  a23 

d! 

dt 

4 

dfi 

4o  4a 

d| 

4s 

df 

4 

d? 

4o 

d? 

d?i 

4s 

d* 

df  dae 

4 

4 

4  df  3  df  B  4o 

4 

df  das 

4 

4 

CO 

0? 

xi 

10 

CO 

4 

df  d? 

4 

4s 

4 

df 

4 

4 

jO  jO 

d«  daB 

4 

df 

df 

4 

4  4s 

4 

4 

df 

4s 

4 

d? 

4 

4 

dfa  4o  4s 

dS 

d? 

4 

4 

dS 

d$4 

df 

4 

4  df  4  4? 

d| 

4 

d? 

4 

df 

4 

4 

4  4s 

4 

df 

4 

dfo 

dS 

4 

df 

4 

d$ 

df 

4 

4 

4.  df. 

4 

4 

df 

4. 

4 

4 

4 

4 

4. 

d?o 

4 

d? 

4 

29 


ill 


4 


A1  2 


4 


,e 


4 


,e 

d« 


,6 

de 


4 


jO 

^24 


,6 

de 


d| 


d. 


4 


d« 


d| 


d{ 

4 


4 


4 


4 

4 


di4 

h® 

d4 


4® 

d* 


j8 

di 

j9 

di 


4 

4 


,10 


°«e 


4° 


ji° 

do 


4 

4 


4‘ 


4 


..u 

da 


jlO 


Table  V  (Continued) 

4  4  4o  4s 
d}  4  4  d°0  46  d°, 

4  4  4  4s  d°e 

di  4  dt  4  4  4  4  4s 

4  4  4  4*  4? 

4  4  4  d$4  4s 

4  4 

4  4  4  4i  4s 

4  4  4  4  4  4s  agx  4s 
4  4  4  4,  4o  4s 

^  4»  4*  4? 

4  4  4 

4  4  4  4  4  4.  4, 

4  4  4  4*  4s  47 

4  4  4*  47 
4  4  4« 

4 

4  4  4s 
4  4  4, 

4  4  4a 

4° 

4 
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k 

Table  V  consists  of  the  addresses  (column  l)  of  all  the  elements 

in  Dig  and  it  gives  the  corresponding  sets  and  D^  in  columns  2  and  3, 
respective!".  In  our  example  we  choose  the  same  reference  for  all  the 
elements.  For  instance,  we  may  write  {d3}  =  p.l  [9],  where  [9]  stands 
for  the  complete  reference  as  given  at  the  end  of  tuts  report. 

Assignment  of  some  integer  value,  say  100,  to  every  rescission  index 
completes  the  description  of  the  set  Dia  (with  12D  =  <j> )  of  our  information. 

Our  example  contains  nineteen  lemmas  and.  theorems.  Only  one  of 
them  has  a  name,  namely,  t^3  =  Free  Representation  Theorem.  Therefore 
we  omit  the  corresponding  list  of  addresses  and  names.  The  addresses 

of  the  elements  of  the  types  s  and  t  are  contained  in  Table  VI,  which 

k  k  k 

gives  the  references  fs*}  and  ftp  of  these  elements.  The  sets  (sp, 

k  ~k  k 

(tp,  (si),  and  (tp  of  our  example  are  all  empty. 


Table  VI 


Csf) 

at 

P.2,  [9] 

W) 

38 

Lensaa  1.1,  p  7,  [9] 

fts) 

s 

Lemma  1.2,  p.3,  [9] 

Theorem  1.8,  p.7,  (93 

{*1°} 

J= 

Lemma  1,4,  p.4,  [9] 

m 

Theorem  1.1,  p.4,  [9] 

» 

Theorem  1.1,  p.4,  [93 

m 

Uma.  l.J,  p.3,  [93 

Ua3 

at 

Leona  1.5,  p.6,  [9] 

f48} 

n 

Theorem  1.6,  p.6,  (93 

ft1*8} 

32 

Theorem  1.2,  p.5,  [9] 

te8} 

at 

Theorem  1.3,  p.5,  [93 

31 


Table  VI  (Continued) 


{tg2}  =  Lenna  1.5,  p.6,  [9] 

[t\s)  =  Lemma  1.5,  p.6,  [9] 

{t£2}  -■  Theorem  1.4,  p.6,  [9] 

{t^2}  =  Lemma  1.6,  p.6,  [9] 

{t?2}  =  Thearem  1.5,  p.6,  [9] 

[t\2}  =  Theorem  1.7,  p.6,  [9] 

{t*2}  =  p.8,  [9] 

The  second  column  of  Table  VII  shows  the  sets  and  ^  that 
correspond  to  addresses  in  the  first  column.  The  third  column  contains 
elements  of  the  sets  and  T^. 


Table  VII 


9 

S1 

4 

4 

4 

4x 

4 

„10 

Si 

df 

4 

4 

jO  j6  jO 

06  Oft  Ofi  &10  06 

or13 

S1 

4 

di1 

4 

a13 

sa 

di1 

4s 

4 

j6  jO  jO  jO  .46 

o5  Ooo  05  “a 

4 

si3 

d1!1 

4° 

4 

4  dS*  d“ 

tf 

d! 

tl 

4 

4 

4 

4 

4 

d? 

t| 

4 

dj 

4 

4 

ti° 

A 3 

d! 

4 

r10 

d l 

d? 

4 

4 

4  4m 

tia 

4 

diX 

4 
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Table  VII  (Continued) 


tf 

4 

4  df 

dfi 

,,o 

d23 

.6 

dia 

d-4  4 

4  df 

df 

4 

4 

tf 

df 

4s 

d| 

df 

tf 

,,11 

di 

df 

4s 

4 

tf 

,,n 

di 

dfi 

jii 

di 

tf 

,,5 

dn 

4  df  8  df 

4a  df  dgg 

*10 

di 

tf 

df 

d* 

df 

4 

4  4 

4 

43 

4 

df  4  d|s 

4  4  d°1B  4 

d* 

df 

df 

df6  4 

die 

4° 

df 

tf 

4 

4s  dl  4 

jO  .0 

aie  a«  dga 

d|  4  df  3 
dfi  43  4e 

df 

4 

d|  41 

Next  we  give  the  verbal  statements  of  these  elements 

[sf].  A  subhalf  groupoid  2  of  a  halfgroupoid  JC  generates  JC  if  and  only 
if  every  closed  sub  half  groupoid  of  JC  containing  £  is  etJual  to  X. 

[tf  ].  If  a  subhalf  groupoid  ^  of  a  halfgroupoid  JC  generates  a 
subhalf  groupoid  X  of  X  and  X  generates  X  then  ^  generates  JC. 

CtS3.  If  ^  is  a  generating  sub  half  groupoid  of  a  halfgroupoid  X  and  if 
<p  is  a  homomorphism  of  ^  into  a  halfgroupoid  X,  ther  9  can  be  extended 
in  at  most  one  way  to  a  homomorphism  8  cf  X  into  X. 

[tj].  If  2.  is  a  finite  halfgroupoid,  then  there  exists  a  halfgroupoid 
JC  generated  by  one  element  of  £  and  irabeddable  in  JC. 

[si0].  A  halfgroupoid  K  is  freely  generated  by  a  sub  half, Toupoid  £  if 
and  only  If  the  following  conditions  hold 

(i)  If  a  is  a  prime  in  then  a  is  prime  in  JC. 

(ii)  If  a  c  JC,  a  i  2.>  then  a  ■  be  in  JC  for  one  and  only  one 
ordered  pair  b,c  c  JC. 
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(iii)  Every  divisor  chain  is  either  finite  or  finite  over  £. 

[t*0].  If  £  is  a  halfgroupoid,  then  there  exists  a  groupoid  X  freely 
generated  by  £. 

[ta "].  If  a  halfgroupoid  £  freely  generates  groupoids  X  and  K,  then 
there  exists  an  isomorphism  cp  of  X  -'nto  K  such  that  cp  is  an  identity 
mapping  on  £. 

[sj2].  If  a  halfgroupoid  JC  is  an  extension  of  a  halfgroupoid  £,  then 
3C  is  free  over  £  if  and  only  if  K  is  an  open  extension  of  £. 

[sg2].  A  halfgroupoid  X  is  free  if  and  only  if  the  following 
conditions  hold: 

(i)  If  a  e  X  and  if  a  is  not  prime  in  X,  then  a  =  be  in  X  for 
one  and  only  one  ordered  pair  b,c  «  X. 

(ii)  Every  divisor  chain  in  X  is  finite. 

[  r.Jj 2  ] .  A  groupoid  £  is  free  if  and  only  if  £  is  a  free  product  of  free 
greupoids  of  rank  one. 

[t*2].  If  £  is  a  gro\ipoid,  then  there  exists  a  free  groupoid  X 
isomorphic  with  £. 

[t|3].  If  9  is  a  homomorphism  of  the  groupoid  £  onto  a  free  groupoid  £5, 
then  there  exist  a  subgroupoid  X  of  £  and  an  isomorphism  cp  of  2  onto  X 
such  that  9(cp(x) )  =  x  for  every  x  c  2* 

[tj,2].  If  X  i<»  a  fr ’C  halfgroupoid,  then  the  set  of  all  primes  *n  X  is 
a  free  basis  of  X. 

[t\a].  If  X  is  a  free  halfgroupoid  and  B  is  a  free  basis  of  X,  then 
B  is  the  set  of  all  primes  in  X. 

[tj*].  ifx  is  a  subhalfgroupoid  of  a  free  h&lf groupoid  £,  then  X  is 
free. 

Suppose  X  =  (H,f)  is  a  free  halfgroupoid.  Be  p,  B  /  B 

generates  a  closed  subhalfgroupoid  X.  of  X.  If  no  proper  subset  of  B 

generates  X, ,  then  5  to  a  I'rce  basis  of  K, . 


[t^3].  If  JC  is  a  free  groupoid,  then  there  exists  a  subgroupoid  £  c**  JC 
such  that  £  is  free  and  of  countable  order. 

[t®3].  Let  the  groupoid  £  be  freely  generated  by  a  subhalfgroupoid 
5  =  (F,I)  aad-  let  JC  =  (H,h)  be  a  subgroupoid  of  Lei  £  be  the  set  of 
all  primes  of  JC  which  are  not  in  F  D  H.  Then  one  of  the  following  holds: 

(i)  P  is  empty,  F  fi  H  is  not  empty  and  JC  is  freely  generated 
by  F  fl  H;  or 

(ii)  F  fl  H  is  empty,  P  is  not  empty  and  P  is  a  free  basis  of  JC; 
or 

(iii)  Neither  P  nor  F  fl  H  is  empty  and  JC  is  a  generalized  free 
product  of  £  and  K,  where  £  is  a  free  subgroupoid  of  JC  with  free  basis 
P  and  K  is  a  subgroupoid  of  JC  freely  generated  by  F  fl  H. 

[t£3].  Let  {JC^  =  (H  e  A]  be  a  disjoint  collection  of  groupoids 

with  amalgamated  subgroupoids  X(a,P).  Let  0  ,  be  an  isomorphism  of 

“Ofp 

K(a,p)  onto  K(g,of).  Suppose  that  jCg  =  9^  and  that  x  6  X(or,8) 
together  with  j)^(x)  e  X(f},7)  imply  that  x  e  X(ar,?)  and  9^(x)  = 

VVx))*  kt5s  JC  (S>^)  for  a,b  s  H  let  a  -  b 

whenever  either  a  =  b  or  there  exist  a,£  such  that  or  i  a  e  X(o;p,', 

b  e  K(0,Of),  and  9~^a)  *  b.  Let  S  be  a  subset  of  '  that  for 

—  —  “up  — 

every  b  c  a  e  b  and  for  every  c  e  H  with  c  s  a,  c  e  S  .  Let 

S  =  fsja  c  H]  and  for  x,  z  c  S  1.  t  g^x,y)  =  z  provided  for  some 

a  e  x,  b  e  y,  and  c  «  z  ve  have  f(a,t j  *  c.  Let  JC_  -  (S,g)  and  J C  be 
a  groupoid  generated  by  the  halfgroupoid  JC_.  Then  JC  is  isomorphic  to  a 
generalized  free  product  of  a  disjoint  collection  of  groupoids  with 
amalgamated  sugbroupoide . 

Relations  between  the  elements  as  defined  in  Section  II  can  be 
easily  constructed  from  Tables  V  and  VII.  Also  k-chains  of  elements  can 
be  obtained  by  consulting  these  Tables.  A  k-chain  of  relations  can  be 
written  down  by  first  constructing  a  corresponding  k-chain  of  elements 
and  then  by  taking  the  natural  relations  from  the  lists  of  verbal 
statements  of  the  elements.  Also  the  matrices  M.  that  represent  the 
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structure  of  J '  =  I  \  I  ,  can  be  easily  constructed  from  Tables  VI 
and  VII.  For  instance,  matrix  M,  is  a  2Q  X  1  matrix  with  elements  in 

X 

rows  3,  7 ,  9,  11,  20,  25,  25,  26  equal  to  one  and  with  zeros  in  the 
remaining  positions.  The  matrix  is  Th  X  12.  The  row  c*3  of  Table 
VII  indicates  that  the  first  column  of  will  have  a  one  in  rows  JU, 
k-9,  and  73.  The  remaining  elements  of  this  column  are  zeros.  The 
other  elements  can  be  easily  determined  by  examining  Table  VII  and  by 
obtaining  sequential  numbers  of  elements  d^  from  the  combined  sequence 
of  first  columns  in  Tables  I  and  V. 

IV.  RETRIEVAL  OF  INFORMATION 

Our  information  system  consists  of  elements  and  certain  relations 
induced  by  the  structure  of  the  elements.  Hence,  these  elements,  their 
components  and  their  relations  are  what  we  oar.  retrieve.  Of  course,  the 
choice  of  the  elements  was  made  in  order  to  provide  a  retrieval  of 
information  that  would  satisfy  most  of  the  needs  of  a  researcher  in  the 
field.  We  hope  that  we  have  succeeded  to  some  degree.  In  fact,  the 
starting  point  for  the  choice  of  our  definition  was  a  collection  of 
likely  questions  that  one  may  want  to  ask. 

In  the  present  section  we  try  to  describe  these  questions  in  general 
terms  or  as  classes  of  questions.  Next  we  attempt  to  construct  the 
rules  for  obtaining  the  answers  by  retrieval  of  elements  of  our 
information,  their  components,  and  their  relations.  Of  course,  we 
assume  that  the  system  can  be  interrogated  by  formulating  new  requests 
for  information  according  to  responses  obtained  to  previous  questions. 

We  are  only  interested  in  principles  of  operation.  Therefore  in 
our  discussion  it  does  not  matter  how  retrieved  information  is  displayed. 
We  may  assume  that  the  system  can  provide  a  temporary  display  of 
i  Grieved  information  for  scanning  purposes  as  well  as  produce  some  sort 
of  a  pennanent  copy. 

It  seems  that  answers  to  ^he  following  classes  of  questions  should 
satisfy  many  of  the  needs  of  the  user  of  information  in  a  discipline  of 
mathematics. 


*>6 


(or)  How  is  a  term  c  defined? 

(j3)  Are  terms  b  and  c  synonyms? 

(7)  What  are  special  cases  of  concept  c? 

(&)  What  are  generalizations  of  concept  c? 

(e)  Does  there  exist  a  concept  d  such  that  b  and  c  are  special 
cases  of  d? 

(T|)  Does  there  exist  a  corcept  d  such  that  both  b  and  c  are 
generalizations  of  d? 

(§)  Are  statements  q  and  r  equivalent? 

(8)  Does  q  imply  r? 

( t)  What  ic  an  exac^  wording  of  a  theorem  on  subject  q,  if 

any? 

(x)  Under  what  conditions  does  q  imply  r? 

(1)  Does  there  exist  a  theorem  that  relates  q  and  r? 

(m>)  What  are  generalizations  of  theorem  t? 

(v)  What  are  special  cases  of  theorem  t? 

Of  course,  question  (0)  and  many  other  questions  on  this  list  can 
be  asked  about  postulates  or  conjectures  as  well  as  about  theorems.  Also, 
we  may  inquire  about  a  symmetric  theorem  on  our  subject,  not  just  any 
theorem,  i.e.  we  may  want  to  know  both  necessary  and  sufficient 
conditions  or  just  sufficient  conditions.  Therefore  we  assume  that  the 
user  may  choose  to  specify  which  of  the  sets  P^,  C N,  SN,  and  T^  should 
be  examined. 

The  user  may  not  be  familiar  with  the  dictionary  of  the  information 
system.  Therefore  he  may  want  to  begin  his  search  with  certain 
preliminary  questions  that  are  equivalent  to  consultation  of  some  sort 
of  "thesaurus".  In  our  case  a  "thesaurus"  consists  of  the  main 
dictionary  of  the  system  as  well  as  of  ^D-,  kP-,  kC-,  kS-,  and  ^T  - 
dictionaries  with  k=l,2, ...,N,  as  described  in  Section  II  above.  We 
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allow  three  classes  of  preliminary  questions,  that  we  call  "dictionary 
questions" . 

(1)  Is  the  term  c  in  the  dictionary?  This  question  v»y 
include  a  request  for  synonyms  and  the  main  term  of  c. 

(2)  Which  terms  of  the  System  I  appear  in  the  definition 
of  the  term  c? 

(3)  Which  terms  are  in  the  system?  This  question  may  be 
about  any  of  the  following  collections  of  terms: 

(31)  All  the  terms  in  the  system. 

(311)  All  the  terms  in  =  D,  i.e.  all  the  terms  that 
represent  concepts  as  opposed  to  names  of  theorems  and  so  on. 

(31L1)  All  the  terms  contained  in  for  some  k. 

(3iv)  All  the  terms  in  D^. 

Of  course,  the  same  procedure  that  is  used  to  examine  the  terms 
in  and  D  can  be  applied  to  scan  the  terms  in  A^  and 
T,  or  S. 

Questions  (or)  and  (p)  are  questions  on  elements  or  their  components. 
We  call  these  "element  questions". 

The  remaining  questions  (7)  -  (v)  are  on  relations  of  elements. 
Questions  (7)  --  (T|)  are  about  relations  between  concepts.  Consequently, 
they  do  not  involve  any  semantics  and  hence  can  be  answered  very  easily. 
The  remaining  questions  involve  statements  q  and  r.  These  statements 
may  consist  of  a  single  term.  For  instance,  the  question:  "which 
topological  spaces  are  metrizable"  is  of  class  (x)  with  q  =  topological 
space  and  r  =  metrizable,  i.e.  both  q  and  r  consisting  of  single 
concepts.  However,  frequently  q  and  r  will  be  more  complex.  In  our 
opinion  no  increase  in  storage  capacity  and  in  processing  speed  of 
present  electronic  devices  and  no  sophistication  of  accompanying 
software  will  produce  a  system  able  to  read  and  interpret  the  semantics 


Ajj,  for  A  =  P,  C, 
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of  a  living  language  with  all  its  variations  from  person  to  person  and 
with  its  continuous  growth  and  changes.  One  can  build  an  automaton  to 
handle  only  a  rigid,  fixed,  and  limited  language.  Therefore  automatic 
answering  of  questions  with  semantics  can  be  accomplished  only  for  a 
very  rigid  language.  On  the  other  hand,  we  should  not  expect  that 
every  information  seeker  will  learn  this  inflexible  baby  talk  of  a 
computer.  Therefore  an  automatic  answering  system  should  not  depend  on 
semantics.  We  propose  to  limit  semantics  to  a  selection  of  the  class 
of  a  question.  Since  our  list  of  classes  contains  only  thirteen  items, 
(a)  through  (v)  above,  it  is  not  too  much  to  ask  the  user  to  select  the 
class  to  which,  in  his  opinion,  his  question  belongs.  The  rest  of  the 
semantics  that  is  expressed  by  the  linguistic  form  of  statements  q  and 
r  is  discarded  by  constructing  two  sets  E  and  F  that  consist  of  concepts 
in  our  main  dictionary.  The  set  E  contains  the  terms  that  appear  in 
the  statement  q  and  the  set  F  consists  of  terms  in  r.  We  may  have  a 
very  large  number  of  different  sets  of  the  type  E  and  F,  but  an 
automatic  system  would  have  no  difficulty  in  recognizi’*  *  when  such  sets 
are  identical,  when  an  inclusion  relation  holds,  and  when  it  does  not 
hold. 

Of  course,  replacement  of  the  statements  q  and  r  by  the  sets  E 
and  F  will  result  in  retrieval  of  irrelevant  information.  For  instance, 
if  we  ask  if  q  implies  r  and  if  we  specify  the  class  of  question  (0) 

"nd  at  the  same  time  replace  q  and  r  by  E  and  F  respectively,  we  may 
retrieve  a  theorem  Mq'  implies  r'",  where  q‘  and  r'  are  stated  in  the 
terms  contained  in  E  and  F  respectively,  and  yet  the  retrieved  theorem 
may  have  no  relation  to  our  original  question.  However,  it  seems  that 
in  a  restricted  field  of  science  the  incidence  of  such  irrelevant 
responses  will  not  be  very  high,  and  the  user  will  not  have  to  waste 
much  time  in  sorting  out  irrelevant  answers  from  relevant  information. 

We  assume  that  the  system  constructs  the  sets  E  and  F 
automatically.  A  question  of  the  user  should  be  fed  into  a  system  in 
its  original  semantic  furra.  However  the  user  should  specify  which  part 
of  the  question  contains  the  statement  q  and  which  part  is  a  phrasing  of 
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the  statement  r.  The  system  should  scan  the  question  and  construct  the 
sets  E  and  F  by  including  those  terms  that  are  in  the  main  dictionary. 

We  assume  that  our  dictionary  includes  words  that  are  synonyms  in  the 
technical  jargon  of  the  particular  discipline,  such  as  "metrizable"  and 
"metrizability" .  These  may  be  our  primary  synonyms.  Furthermore,  the 
system  should  be  able  to  accept  secondary  synonyms,  that  -  by  the  choice 
of  the  designer  of  the  syst  n  -  may  include  certain  semantic  variants  of 
primary  synonyms  such  as  plural  or  possessive  form,  past  tense,  or 
conjunctive  mode.  It  is  still  possible  that  the  user  will  use  a  form 
that  is  not  in  the  main  dictionary.  Such  a  term  would  be  omitted  in 
constructing  E  and  F.  Hence  the  user  should  either  carefully  check  the 
vocabulary  or  else  select  judiciously  the  class  to  which  his  question 
belongs.  For  instance,  even  with  some  terms  emitted  from  E  or  F  a 
correct  answer  may  result  if  instead  of  class  (0)  the  class  (x)  is 
selected. 

In  our  opinion,  by  successive  choice  of  questions  one  can  obtain  the 
desired  information,  provided  it  is  contained  in  the  system. 

Of  course,  we  do  not  think  that  a  system  should  be  designed  capable 
of  answering  questions  about  itself  such  as  how  complete  it  is  or  what 
is  the  range  of  applicability  of  the  information.  Nor  should  the 
system  be  intended  to  supply  any  evaluation  of  its  contents  such  as 
complexity  of  the  proofs  of  theorems,  etc. 

The  form  of  some  questions  listed  above  may  suggest  that  answers 
may  be  simply  "yes"  or  "no".  However,  we  assume  that  in  many  cases  the 
answer  is  more  complete.  For  instance,  the  answer  to  (5)  may  include 
a  verbal  statement  of  a  particular  theorem.  Verbal  statements  need  not 
be  restricted  to  synonyms  included  in  the  system.  For  instance,  the 
verbal  statement  [sf]  of  our  example  in  the  preceding  section  contains 
the  words  "closed  halfgroupoid  of  JC".  Certainly,  3C  is  not  a  part  of 
our  vocabulary.  Its  meaning  follows  from  the  preceding  sequence  of  words 
in  C sf  ] .  Besides  a  verbal  statement,  the  user  frequently  may  des  e  a 
re  ference,  where  he  could  find  a  proof  of  a  theorem  and  also  additional 
rei’erences . 


We  can  express  the  questions  .listed  above  in  terms  of  requests  for 
elements  of  information,  their  components,  and  their  relations.  We  will 
use  this  formulation  in  the  next  section  to  construct  "ules  of  operations 
that  would  retrieve  the  desired  information.  We  begin  with  dictionary 
questions . 

Question  (l)  is  a  search  for  a  main  term  that  corresponds  to  a 
chosen  synonym  or  its  variant.  For  this  we  roust  scan  certain  portions 
of  our  dictionaries.  The  scanning  may  be  facilitated  if  we  know  the 
type  of  the  term.  For  instance,  we  may  want  to  search  for  a  name  of  a 
theorem  or  for  a  concept.  Again  we  may  know  that  the  concept  is  rather 
basic  and  hence  most  likely  is  an  element  of  information  of  low  level. 
Therefore,  instead  of  searching  in  the  DN-dictionary  we  may  choose  to 
scan  the  corresponding  portion  of  D^-dictionary  for  some  k  <  N.  If  no 
dictionary  is  specified,  then  the  system  should  scan  the  main  dictionary. 
If  a  match  with  the  term  c  is  found,  the  address  a1  and  the  corresponding 
main  term  are  given.  If  no  such  match  is  found  the  answer  is  that  the 
term  is  r  t  in  the  information.  The  user  may  specify  which  components 
of  the  element  should  be  displayed  or  printed.  A  request  for  the 
third  and  fourth  component  would  answer  question  (2)  as  stated  above. 
Similar  search  is  performed  in  any  other  dictionary  specified  by  the 
user. 

Dictionary  question  (3)  can  be  answered  by  a  visual  scanning  of  a 
specified  portion  of  a  selected  dictionary.  A  dictionary  is  selected  by 
indicating  D^-dlctionary, 

specified  then  the  main  dictionary  is  used.  The  portion  of  the 
specified  dictionary  to  be  scanned  is  indicated  by  a  set  of,  say,  three 
letters  such  as,  for  instance,  COL  or  DIS  when  it  is  desired  to  check 
whether  the  term  "disjoint  collection  of  halfgroupoids"  or  "collection 
of  disjoint  halfgroupoids "  are  in  the  information.  (It  happens  that  in 
our  example  the  first  is  a  concept  or  a  term;  the  second  is  a  phrase  or 
a  statement).  After  a  dictionary  and  a  beginning  searching  place  are 
chosen,  the  system  should  display  successive  words  in  the  dictionary 
beginning  at  the  specified  place.  At  any  point  the  user  may  terminate 


1^ -dictionary,  etc.  If  a  dictionary  is  not 


the  search  and  may  request  the  display  of  any  desired  component  of  the 
element  that  was  displayed  at  the  time  of  termination.  Or  else  the  user 
may  start  his  search  in  a  different  dictionary  or  at  another  place  of 
the  same  one. 

The  two  element  questions  (or)  and  (0)  can  he  answered  by  first 

determining  the  address  a^  (or  addresses  and  b^  in  the  case  of  (0)). 

These  addresses  can  he  obtained  as  in  question  (l).  Identity  of  both 

addresses  would  answer  question  (0)  affirmatively.  Different  addresses 

would  mean  that  b  and  c  are  not  synonyms.  The  answer  to  question  (a) 

wouid  be  obtained  by  retrieving  the  verbal  statement  [a.].  Additionally, 

k  ^  k 

one  may  desire  to  retrieve  the  reference  [a. j  or  synonyms  (a.),  i.e.  to 

1  k  1 

retrieve  the  sixth  or  the  second  component  of  a^. 

The  answer  to  question  (?)  consists  of  the  addresses  of  all  the 

V 

elements  of  the  set  D  for  which  c  =  A^.  This  answer,  by  the  choice  of 
the  user,  may  contain  addresses,  or  main  terms,  or  verbal  statements, 
or  any  combination  of  these.  Usually  c  e  fflD  and  it  may  be  desirable  to 
limit  the  search  to  elements  that  belong  to  for  seme  k,  or  that  belong 
to  the  union  of  some  collection  of  such  sets. 

This  procedure  would  recover  any  special  cases  of  the  concept  c  that 
are  introduced  by  definitions.  However  c  may  be  a  generalization  of  a 
concept  b  by  virtue  of  the  semantic  meaning  of  various  definitions  of 
lower  level  in  terms  of  which  both  c  and  b  are  defined  or  by  virtue 
of  the  meaning  of  terms  that  belong  to  DQ,  i.e.  the  terms  that  are  not 
defined  in  the  information.  The  logical  relation  between  b  and  c  in 
this  case  can  be  established  by  &  theorem  whose  proof  may  be  very 
simple  or  very  difficult.  In  any  case  a  search  for  this  type  of  special 
case  is  a  search  for  theorems  either  of  the  form  "c  implies  b"  or  "every 
c  with  the  property  f  is  a  b".  A  search  for  this  type  of  generalization 
is  a  question  of  some  class  from  (0  to  (n).  Retrieving  answers  to  these 
questions  is  described  below. 
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The  answer  to  question  (8)  is  simply  the  third  component  KD.  of 
d^  =  c.  The  user  may  choose  to  repeat  the  retrieval  of  the  third 
component  of  the  element  contained  in  and  so  on.  This  would 
produce  higher  level  generalizations  if  they  exist.  Again,  in  this 
way  generalizations  introduced  by  definitions  can  be  retrieved.  Other 
generalizations  that  are  implied  by  the  logical  structure  of  the 
discipline  axe  of  the  type  discussed  above  in  connection  with  question 
(7).  The  remarks  made  there  apply  to  this  case  also. 

We  obtain  an  answer  to  question  (e)  by  constructing  positive  chains 
of  elements  from  b  and  c,  say,  a^,a2,...,  and  with  8=0^  and 

c  =  0^.  We  add  alternately  one  element  to  each  chain  and  compare  the 
added  element  with  all  the  elements  in  the  other  chain  that  belong  to 
information  of  the  same  level  until  we  find  a  matching  pair  of  elements 
or  until  both  chains  reach  the  maximum  length.  In  the  first  case  the 
matching  element  is  the  concept  d.  In  the  second  case  a  concept  d  with 
the  required  properties  does  not  exist. 

Question  (T])  is  analogous  to  (e).  The  answer  can  be  obtained  by 
constructing  negative  chains  of  elements  beginning  with  b  and  c. 

Question  (5)  can  be  directed  at  postulates,  conjectures,  or  theorems. 

Hence  the  user  should  specify  one  of  the  sets  ^P,  P^,  ^C,  C^,  kS,  or 

in  which  to  search  for  a  desired  statement.  After  this  the  componen  0 
k  k 

and  A^^  of  the  elements  in  the  specified  set  are  compared  with  E, 

the  set  of  concepts  in  q.  In  the  case  of  equality  of  the  two  sets,  say, 
k  k 

in  the  case  when  *  E  the  set  A^^  is  compared  with  F,  the  set  of 

k  k 

concepts  in  r.  If  F  *  A^  then  the  component  [a^  is  retrieved.  The 
user  decides  whether  this  is  a  relevant  statement  or  not  and, 
accordingly,  terminates  his  search  or  continues  it  further. 

Question  (6)  can  be  handled  in  the  same  way  as  (5),  except  that 
the  sets  of  the  form  ,T  and  T.  must  be  included  in  the  search. 
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In  the  case  of  question  ( t)  one  selects  ,A  or  A.  as  in  (§)  and  then 

k  k  K  K 

the  'onion  of  the  components  A  and  A.  of  each  element  in  the  selected 

11  k 

set  is  compared  with  the  set  E.  When  an  element  a.  is  found  such  that 
k  k  k 

E  is  contained  in  A^  1)  A^,  the  component  [a^]  is  displayed  or  printed. 
The  user  decides  whether  this  is  a  satisfactory  answer  to  his  question. 


Question  (x)  requires  a  search  for  a  theorem  (or  conjecture  or 

postulate)  with  the  consequent  r  and  with  an  ante cedant  that  contains  q. 

In  the  search  we  compare  F,  the  set  of  the  concepts  contained  in  r,  with 
k  k 

components  3^  and  3^  of  the  elements  in  or  and  with  components 

of  the  elements  in  or  T^.  When  a  component  equal  to  F  is  found, 
one  tests  if  the  other  of  the  two  components  kA  or  A^  contains  E.  If 
it  does  the  component  [a*]  is  retrieved. 


Handling  of  question  (\)  is  simila.  to  that  of  (x)  except  that  no 
distinction  is  made  between  sets  of  type  S  and  type  T.  Furthermore, 
instead  of  searching  for  a  component  of  an  element  that  equals  F  one 
searches  for  a  component  that  contains  F  as  a  subset. 

In  question  (p)  let  q  be  the  antecedant  of  t,  and  let  r  be  the 
consequent.  Generalization  of  t  may  be  interpreted  in  the  following 
ways:  (p1)  number  of  conditions  contained  in  q  is  reduced; 

(p0)  consequent  r  is  changed  to  a  consequent  r'  in  such  a  way  that  r' 
applies  to  a  larger  class  of  entities  than  r;  (p^)  antecedant  q  is 
changed  to  an  antecedant  q'  that  applies  to  a  larger  class  of  elements 
than  q;  (p^)  combination  of  (p^)  and  (p2);  (p^)  combination  of  (p2) 
and  (p^).  Usually,  a  reduction  of  the  number  of  conditions  contained 
in  q  will  increase  the  class  of  elements  that  satisfy  the  (reduced)  set 
of  conditions.  Hence  (p^)  and  (p^)  may  seem  to  be  overlapping. 

However,  we  adopt  the  following  interpretation  of  these  cases  that 
makes  them  disjoint.  We  assume  that  in  the  case  (p^)  the  change  of  q  is 
the  elimination  of  some  conditions.  This  means  that  the  set  E  is 
changed  by  removing  some  of  its  elements.  In  the  case  (p^)  the  number 
of  conditions  is  not  changed,  but  some  of  the  concepts  ir.  the  conditions 
1  ace  replaced  by  their  generalizations.  According  to  this  interpretation 


the  case  (|a^)  is  analogous  to  (x).  The  case  (^)  can  be  handled  in  two 
steps.  First,  the  user  selects  the  terms  of  E  that  he  may  want  to 
generalize  and  replaces  them  by  generalizations  of  his  choice.  He  may 
use  the  procedure  described  under  (6)  above  to  select  such  generalizations. 
Then  he  can  choose  the  procedure  in  {5)  -  (x).  The  case  (n0)  is 
analogous  to  the  case  (1^).  Here  again  first  some  terms  in  F  are 
replaced  by  their  generalizations  and  the  search  of  the  type  (|)  -  (X) 
is  applied.  Similar  combinations  of  previously  described  procedures 
would  handle  the  cases  (p^)  and  (p^).  It  should  be  noted  that  this 
procedure  may  not  yield  a  desired  result  even  when  a  certain 
generalization  of  a  selected  theorem  is  contained  in  the  system. 

However,  repeated  requests  for  a  certain  type  of  generalization  will 
lead  to  a  desired  theorem,  or  the  user  will  find  that  no  generalization 
in  a  chosen  direction  is  contained  in  the  system. 

Question  (v)  is  analogous  to  question  (p).  Here  again  we  consider 
the  following  cases:  (v^)  added  conditions  to  the  antecedent;  (vg) 
particularization  of  sane  concepts  in  r;  (v^)  particularization  of  some 
concepts  in  q;  (v^)  combination  of  (v^)  and  (v^);  (v^)  combination  of 
(v2)  and  (vj).  Again,  particularization  of  concepts  is  chosen  by  the 
user,  probably,  with  the  aid  of  the  procedure  described  in  (7).  The 
search  is  completed  by  choosing  one  of  the  procedures  (§)  -  (X). 

V,  PRINCIPLES  FOR  IMPLQ4ENTATI0N 

An  element  of  information  consists  of  eight  pieces:  its  address 

k 

a^  and  its  seven  components.  Search  procedures  described  in  the 
preceding  section  require  a  direct  access  to  any  component  and  also  to 
addresses  contained  In  the  third  and  the  fourth  components  of  an  element. 

A  system  that  satisfies  these  conditions  could  be  constructed  as  a  card 
catalog,  although  some  search  procedures  with  the  cards  would  be  very 
lengthy.  Nevertheless  we  feel  that  a  description  of  the  principles  in 
terms  of  the  concrete  concepts  of  a  card  file  may  be  easier  to  read  than 
an  abstract  discussion.  Furthermore,  access  procedures  for  a  card  file 
can  easily  be  interpreted  as  procure- c  carried  out  by  electronic  data 


processing  equipment.  Later  in  this  section  we  will  discuss  an  idealized 
electronic  system  that,  in  our  opinion,  is  feasible  in  principle, 
altnough,  probably,  nobody  would  consider  building  such  a  system,  at 
least  now. 

Thus,  we  store  each  element  of  information  on  a  r  'd  that  contains 
the  address  of  the  element  and  all  its  components.  The  file  of  all  the 
elements  of  information  is  arranged  "lexicographically"  with  respect 
to  their  addresses.  For  this  purpose  we  interpret  the  address  as  a 
three  dimensional  vector  (a,k,i)  with  a  =  d,  p,  c,  s,  or  t.  It  is 
convenient  to  assume  the  order  of  the  values  of  a  as  just  written.  Of 
any  two  elements  with  the  same  first  component  in  their  addresses  the 
one  with  the  smaller  k  precedes  the  one  with  the  larger  k,  and  so  on. 

We  call  this  the  main  information  f'le. 

In  order  to  answer  our  dictionary  questions,  (l)  -  (3)  of  the 
preceding  section,  we  need  a  set  of  "dictionary  files".  The  main 
dictionary  would  contain  a  card  for  each  main  term,  and  one  for  each 
primary  and  secondary  synonym.  These  cards  should  be  arranged  in 
alphabetical  order,  and  each  should  contain  the  corresponding  address  of 
the  element.  Most  of  the  theorems,  conjectures,  and  postulates  have  no 
names.  Hence  their  main  terms  are  identical  with  their  addresses. 
Therefore  these  terms  need  not  be  included  in  the  main  dictionary  or  any 
other  dictionary.  To  ease  a  search  for  a  term  we  may  construct  any 
number  of  other  card  dictionaries  that  would  implement  the  ^A-  and 
A^-dictionaries  described  in  Section  II.  For  Instance,  we  may  want  to 
have  two  additional  copies  of  the  main  dictionary  cards.  One  cojy 
would  be  divided  into  5  *  (N+l)  parts  according  to  the  values  of  a  and  k 
and  each  part  arranged  in  alphabetical  order.  Another  copy  cculd  be 
divided  into  N+l  parts  according  to  the  value  of  k  and  each  part  arranged 
alphabetically.  We  may  also  choose  to  construct  I^-dictionaries  as  well 
as  A^-dictionaries  for  A  «*  D,  P,  C,  S,  or  T  and  for  k»0,l,2, . . . ,N.  With 
these  fiies  representing  our  dictionaries  we  can  answer  all  the 
dictionary  questions  of  the  preceding  section.  There  is,  of  course, 
a  trade -off  between  the  number  of  the  various  dictioraries  and  the 
a.  rage  duration  of  search  for  an  answer  to  dictionary  questions. 
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An  element  question,  (or)  or  (@)  of  the  preceding  section,  can  be 
answered  in  two  steps:  first  one  finds  the  address  of  a  chosen  element 
in  the  main  dictionary  or  in  any  other  dictionary,  and  then  one  selects 
the  corresponding  card  from  the  information  file. 


Question  (7)  can  be  answered  easily  if  ,re  have  a  -file,  i.e.  a 

1 

file  of  element  cards  of  the  type  d^  arrant;  _u  lexicographically  with 

respect  to  the  address  contained  in  Those  element  cards  that  have 

the  same  address  in  are  arranged  among  themselves  according  to  their 

main  address.  Now  the  answer  to  question  (7)  can  be  obtained  by  first 

finding  the  address  d®  of  c  in  a  dictionary  and  then  selecting  the  part 
k  J  d  k. 

of  c. -file  with  the  address  dT  in  D7“  and  with  k  in  the  main  address  such 
i  J  i 

that  the  inequalities  m  >  k  >  m-i  (i  is  an  integer  less  or  equal  m)  are 
satisfied.  The  selected  cards  will  contain  all  the  information  that  is 
needed  for  ution  (7). 


Question  (8)  can  be  answered  by  selecting  the  address  in  the  third 
component  of  d™,  say,  d®1 ,  then  by  selecting  the  address  ir.  the  third 

V  Jl 

component  of  ,  etc.  until  the  desired  level  of  information  is  reached. 
This  selection  is  conveniently  carried  out  by  usir^  the  main  information 
file. 


Positive  chains  of  elements  that  are  required  to  answer  the 
question  (c)  can  be  constructed  oy  repeated  application  of  the 
procedure  described  in  (6)  above. 

Negative  chains  of  elements  as  required  in  question  (T])  can  be 
constructed  by  repeated  application  of  the  rules  in  (7)  above.  Each 
time  that  this  procedure  produces  more  than  one  element  we  obtain 
branching  of  a  negative  chain.  The  elements  in  all  the  branches  of  the 
negative  chains  beginning  with  the  element  b  muot  be  compared  with 
respective  elements  in  the  branches  of  negative  * hains  beginning  with  c. 

In  order  to  answer  questions  (C)  -  (v)  vc  construct  four  additional 
files  of  the  element  cards,  namely,  a  P-file,  a  C-fiie,  an  F-fiie,  and  a 
T-file.  Each  of  these  files  consists  of  code-purened  caz'ds  sc  described 


k  k 

on  pp.  11-18  of  [l  ].  The  addresses  cf  the  elements  in  A^  and  A  j  are 

represented  by  the  punched  code  in  such  a  way  that  an  insertion  of  a 

tumbler  in  a  proper  hole  of  the  file  permits  us  to  retrieve  all  the  cards 

containing  a  specified  address.  Answers  to  questions  (£)  -  (v)  are 

obtained  by  testing  whether  a  specified  set  E  or  F  is  a  subset  (proper 
*  k  k 

or  improper,1  oi  A  or  A^.  'Thus,  with  the  aid  1  a  tumbler  one  can 
first  retrieve  all  the  cards  that  contain  the  address  of  the  first 
element  in  E.  Next  the  same  procedure  is  applied  to  the  retrieved 
partial  deck  and  the  cards  that  contain  the  address  of  the  second  element 
of  E  are  pulled  out.  A  repetition  of  this  procedure  for  the  other 
elements  of  E  would  produce  the  desired  collection  of  cards. 

The  set  .0  may  be  very  large,  and  it  may  be  impossible  to  provide 
this  type  of  junch-code  for  the  address  of  every  element  in  D.  However, 
in  principle,  one  can  assume  a  sufficiently  long  card  or  the  holes  and 
notches  sufficiently  small  to  perrni  .ccommodation  of  all  the  necessary 
addresses . 

The  aspect  just  described  brings  out  an  important  point  of  this 
system,  namely  the  need  to  scan  all  the  elements  (cards)  that  contain 
certain  addresses.  We  may  say  that  all  the  cards  containing  a  specified 
address  must  be  connected  in  some  sense.  In  order  to  obtain  answers  to 
questions  (y)  -  (11)  we  provide  this  "connectedness”  by  placing  the 
corresponding  cards  consecutively  in  the  ih -file .  For  the  remaining 
cases  we  achieve  "connectedness"  by  using  a  tumbler  together  with  coded 
holes  and  notches. 

■Just  as  in  our  human  memory  a  concept  such  as,  say,  "groupoid"  is 
the  same  in  whatever  context  of  other  concepts  it  appears,  so  also  a 
term  in  the  information  system  should  remain  a  single  entity  instead  of 
a  multiplicity  of  duplicates  some  of  which  can  be  changed  or  deleted 
without  any  effect  on  the  remaining  duplicates.  We  have  a  number  of 
relations  such  as  the  relation  between  a  groupoid  and  its  base  or  its 
..ubgroupoid,  but  there  is  only  one  concept  of  groupoid.  Hence,  an 
information  system  should  contain  one  single  copy  of  each  element  and  a 


number  of  relations  represented  by  proper  connections  between  the 
elements.  We  may  describe  these  connections  in  terms  of  some  sort  of 

conductors  and,  thus,  avoid  multiplicities  of  individual  concepts.  Thus, 

m  1^  k 

we  assume  that  each  element  a,  is  connected  to  each  set  A.  or  A.  of  the 

j  1  1 

element  a^,  provided  the  concept  represented  by  appears  in  [a."]. 

With  such  connections  present  the  requests  of  classes  (o')  to  (v)  can 
be  answered  without  searching  or  scanning  the  irrelevant  elements  of 
the  system. 

Such  a  system  can  be  initially  constructed  and  also  expanded  by 
building  all  seven  components  of  an  element  at  one  time.  We  recall  that 
our  information  system  consists  of  two  types  of  elements.  Elements  of 
level  zero  have  only  two  components,  d°  and  (d°).  When  a  new  element 
of  this  type  is  added  to  the  system,  its  components  (they  consist  of 
words)  are  ”read"  into  the  system  and  after  this  the  respective 
dictionaries  are  automatically  updated.  First,  the  qD -dictionary  is 
updated  by  inserting  the  main  term  of  the  new  element  in  its  proper 
position  according  to  alphabetical  order.  Let  d?  and  d°  be  the 

li  i2 

addresses  of  the  terms  in  the  D-dictionary  between  which  the  new  term 
has  been  inserted.  Then  the  value  of  i  in  the  address  of  the  new  term 

is  ^ Of  course,  ix  =  0  if  the  new  term  is  first  on  the  list  and 
ig  =  1  if  it  is  Id.  t.  This  completes  the  choice  of  the  address  for  the 
new  element.  Now  its  synonyms  can  be  inserted  in  the  proper  places  of 
the  remaining  dictionaries.  Finally,  the  translator  of  the  system  is 
updated  by  incorporating  a  program  that  translates  the  new  main  term 
and  every  synonym  into  the  new  address  d°. 

A  system  should  have  a  provision  to  delete  elements  of  level  zero 
that  are  not  connected  to  any  other  elements  of  the  system.  The  rules 
for  this  procedure  are  described  below  in  this  section.  In  order  to 
avoid  deletion  of  a  newly  added  element,  the  introduction  of  an  element 
of  level  zero  must  be  accompanied  by  the  introduction  of  an  element  of 
higher  level  that  contains  the  new  zero  level  term  in  its  verbal 
statement . 

49 


Elements  of  level  higher  than  zero  are  constructed  as  follows. 

First,  the  designer  of  the  information  system  constructs  all  seven 
components  of  the  new  element.  The  third  and  the  fourth  components  are 
sets  of  addresses  of  those  concepts  that  appear  in  the  fifth  component 
[a1!1]  of  the  new  element.  All  these  concepts  must  be  in  the  system 
already.  The  designer  chooses  the  first  component  a  of  the  address  of 
the  new  element  according  to  its  type.  The  value  of  a  and  all  seven 
components  are  read  into  the  system.  The  system  determines 
automatically  the  maximum  k^  of  all  the  second  components  in  addresses 
contained  in  the  third  acid  fourth  component  of  the  new  element  and  it 
assigns  k  =  k,  +  1  as  the  second  component  of  the  new  address.  The 
third  component  is  obtained  automatically  with  the  aid  of  the 
dictionary,  as  described  above  for  elements  of  level  zero.  A.fter  this 
all  the  dictionaries  are  updated,  and  a  translator  of  the  new  main  term 
and  its  synonyms  into  the  new  address  is  added.  Incorporation  of  the  new 
element  into  the  system  is  completed  by  automatically  connecting  the  sets 

Ir 

‘A^  and  A^  with  the  elements  of  the  system  whose  addresses  appear  in 
these  sets. 

Elimination  of  elements  from  the  system  is  performed  as  follows. 

We  define  a  sequence  of  instances  t^ ,t^, ...  either  by  specifying  a 
time  increment  It  between  the  successive  instances  or  by  choosing  the 
number  of  requests  for  information  which  are  to  be  processed  between 
successive  values  of  t,  .  At  every  instance  t  one  is  subtracted 
automatically  from  the  rescission  index  of  every  element  in  the 
information.  Whenever  an  answer  to  a  request  for  information  includes 
the  verbal,  statement  [a.]  or  the  reference  fa.}  of  an  element,  one  is 

V  Jj  1 

added  to  a^,  the  rescission  index  of  a,^.  When  the  users  collectively 
have  no  interest  in  an  element  of  the  information  or  when  an  element 
becomes  commonly  known  that  it  will  never  or  seldom  be  retrieved, 
eventually  its  rescission  index  will  become  zero.  Then  an  element  of  the 
type  p,  c,  s,  or  t  is  automatically  removed  from  the  system  by  deleting 
its  components,  Ij  removing  the  main  term  and  synonyms  from  the 
dictionaries,  and  by  destroying  its  translator  and  all  the  connections 


of  this  element  with  other  elements  of  the  information.  After  deletion 

of  an  element  of  the  type  p,  c,  s,  or  t  the  system  examines  all  the 

elements  of  D  that  were  connected  to  the  deleted  element.  If  an 
o 

element  from  this  collection  has  no  connections  to  any  other  elements 
of  the  system,  it  is  also  deleted. 

When  the  rescission  index  of  an  element  of  type  becomes  zero, 

the  element  is  converted  into  an  element  d°.  The  first  two  components 

k  o  ^ 

of  become  components  of  the  new  d^ .  This  new  element  is  incorporated 

into  the  system  by  the  rules  described  above  for  incorporating  new 

elements  into  D.  At  the  same  time  the  system  constructs  all  the 

negative  chains  of  elements  that  begin  with  the  old  element  d.  and 

,  i 

replaces  the  address  d*  in  the  third  and  second  components  of  all  the 

elements  in  these  chains  by  d  .  After  this  the  second  components  in  the 

J 

addresses  of  each  element  in  these  chains  are  computed  in  succession, 
starting  with  the  elements  of  lowest  level.  This  computation  is 
performed  by  the  rules  that  are  available  for  incorporation  of  new 
elements  of  level  higher  than  zero.  If  the  second  component  in  the 
address  of  an  element  a”  belonging  to  one  of  these  chains  is  not  changed, 
then  all  the  addresses  in  the  elements  of  the  chains  that  are  in 

negative  relation  to  a®  remain  the  same,  provided  there  is  no  other 

^  k  ui 

negative  chain  that  leads  from  the  element  d^  to  the  element  a..  If, 

however,  the  second  component  of  the  address  a1?  is  changed,  then  the 

^  m 

addresses  of  the  elements  in  the  negative  chains  that  begin  with  a.  are 

J 

adjusted  by  the  same  rules. 


A  question  of  any  type  considered  in  the  previous  section  can 
readily  be  answered  by  this  system.  Suppose  we  have  a  question  of  type 
(l)  about  a  subject  described  by  the  sentence  q.  The  user  specifies  the 
class  of  the  question  and  feeds  the  actual  question  into  the  system. 

As  successive  words  in  the  question  are  read-in,  the  translator  ignores 
all  those  words  that  do  not  belong  to  the  main  dictionary  and  translates 
the  others  into  their  addresses  thus  generating  the  sel  E.  The 
number  of  elements  in  E  is  counted  at  the  same  time.  Let  this  number  be 
n.  Now  all  the  elements  with  addresses  in  E  are  activated,  and  they  in 


turn  send  activation  signals  to  all  the  elements  in  I  that  are  connected 
to  these  elements.  At  the  same  time  a  printer  is  set  to  accept  signals 

ic  k 

from  all  the  elements  in  I  that  have  exactly  n  active  connectors  in  A.  U  A. . 
The  user  may  specify  which  components  of  these  elements  should  he 
printed  or  displayed.  Similarly  a  simple  procedure  can  be  described  for 
retrieving  an  answer  to  any  other  class  of  questions.  These  procedures 
are  analogs  of  the  rules  described  above  for  a  card  file.  We  omit  this 
reiteration. 

The  connection  between  elements  need  not  be  physical.  It  is 
enough  to  be  able  to  activate  simultaneously  all  "replicas"  of  a  given 
element,  i.e.  to  make  a  concept  in  the  system  one  indeed.  Such  an 
automatic  connection  can  be  visualized  by  assuming  that  the  same 
address  (which  is  just  a  code  name  for  an  element)  whenever  it  is  in  the 
system  responds  to,  say,  a  specified  electromagnetic  frequency.  Then  the 
required  "connections"  to  a  new  element  would  be  automatically 
established  once  its  sensitivity  to  the  respective  frequencies  is 
introduced.  In  this  abstract  picture  the  information  system  would  be 
just  a  collection  of  structured  elements  stored  in  any  order  in  a 
certain  location  with  all  the  necessary  relations  and  connections 
provided  by  their  sensitivity  to  appropriate  frequencies  and  their 
ability  to  transmit  the  signals. 

VI.  CONCLUDING  REMARKS 

The  preceding  discussion  is  not  meant  to  be  a  solution  to  the 
problem  of  information  retrieval.  It  is  rather  just  a  formulation  of 
the  problem  with  an  analysis  of  a  possible  approach  to  its  solution.  A 
further  study  of  the  problem  is  needed.  It  is  very  likely  that  a 
careful  examination  of  the  needs  of  information  users  will  lead  to  a 
more  comprehensive  list  of  classes  of  questions  than  the  one  discussed 
in  Section  III.  Additional  classes  and  a  need  for  more  precise  retrieval 
may  require  redefinition  of  the  elements  of  information.  It  may  be  that 
information  should  consist  of  elements  with  more  than  seven  components 
and  of  much  more  complex  structure  than  that  described  above.  However, 
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we  feel  that  any  approach  with  some  hope  of  success  must  concentrate  on 
information  instead  of  documents.  This  is  the  main  thesis  of  our  report. 

Of  course,  construction  of  an  information  system  of  this  type  would 
be  rather  expensive.  Nevertheless,  in  view  of  the  explosive  proliferation 
of  research  publications,  a  great  number  of  which  contain  only  noise, 
construction  of  an  information  system  may  be  less  expensive  than  mere 
sorting  and  labeling  of  documents.  If  we  try  to  estimate  the  cost  of 
time  spent  by  professionals  while  searching  through  irrelevant  documents 
for  desired  information;  the  cost  of  lost  im^rmation,  i.e.  information 
that  is  not  retrieved  when  needed;  the  cost  of  publishing  all  the 
jumble  that  is  presented  as  original  research;  and  above  all,  the  cost 
of  producing  all  those  mountains  of  gibberish,  then,  maybe,  we  will 
find  that  the  expenses  in  professional  effort  and  in  money  for  building 
an  adequate  information  system  are  relatively  modest.  Readily  accessible 
information  that  includes  only  significant  and  original  results,  we 
believe,  would  discourage  production  of  less  than  mediocre  publications. 

A  system  that  is  able  to  incorporate  new  results  at  a  sufficiently  early 
date  may  even  lead  to  reduction  of  the  number  of  research  journals, 
saving  thereby  the  professional  effort  of  reviewers  and  editors  and  also 
the  cost  of  publishing.  In  any  case,  we  believe  that  further  study  of 
feasibility  and  of  methods  for  constructing  an  information  system  should 
be  conducted.  Examples  of  information  sets  should  be  developed,  and  the 
main  problems  in  their  construction  should  be  identified  for  further 
analysis . 
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m  Abstract  1 

Information  is  here  defined  abstractly  as  a  set  of  highly  structured  elements.  This 
structure  induces  a  two-way  classification  of  the  elements  and  also  other  relations 
among  them.  The  definition  includes  rules  for  expanding  this  information  Ly 
incorporating  new  elements  and  for  discarding  elements  that  have  become  obsolete  or 
are  no  longer  needed.  The  definition  of  information  is  geared  to  its  ^ur.^e,  viz.  to 
provide  easy  retrieval  of  known  facts  that  are  of  interest  to  a  specin-ast  in  the 
field. 

A  pilo+  example  of  such  a  retrieval  system  is  included,  and  principles  for  the 
physical  realization  of  the  system  are  presented. 

It  is  stated  that  a  clear  distinction  must  be  made  between  a  collection  of  documents 
and  the  information  they  contain.  Likewise,  the  difference  between  recovering 
relevant  documents  and  retrieving  desired  information  is  emphasized. 
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